Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: deltalake

buoyant-data/oxbow

Collection of AWS Lambdas for creating and managing Delta tables

Language: Rust - Size: 194 KB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 7 - Forks: 4

WeBankFinTech/Streamis

Streaming application development and management system, based on Linkis and DSS, planning to provide the workflow-like graphical drag-and-drop development capability.

Language: Java - Size: 70 MB - Last synced: 5 days ago - Pushed: about 1 month ago - Stars: 97 - Forks: 40

databrickslabs/dbldatagen

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines

Language: Python - Size: 10 MB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 267 - Forks: 51

naiborhujosua/Data-Scientist-learning-path-using-databricks

This is the summary of learning Data Science using Databricks

Size: 51.8 KB - Last synced: 15 days ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0

MrPowers/mack

Delta Lake helper methods in PySpark

Language: Python - Size: 2.8 MB - Last synced: 17 days ago - Pushed: 3 months ago - Stars: 271 - Forks: 39

easonlai/databricks_delta_table_samples

This is a code sample repository for demonstrating how to perform Databricks Delta Table operations.

Language: HTML - Size: 23.9 MB - Last synced: 19 days ago - Pushed: almost 2 years ago - Stars: 2 - Forks: 1

palutz/rust_nextstep

A series of exercises to play with more advanced topics in Rust

Language: Rust - Size: 298 KB - Last synced: 25 days ago - Pushed: 25 days ago - Stars: 0 - Forks: 0

makism/datastack-playground

A datastack playground; includes Spark, Kafka, Airbyte, etc.

Language: Jupyter Notebook - Size: 55.7 KB - Last synced: 28 days ago - Pushed: 7 months ago - Stars: 0 - Forks: 0

bmsuisse/lakeapi

API for distributing Data Lake Data

Language: Python - Size: 14.8 MB - Last synced: 28 days ago - Pushed: 28 days ago - Stars: 6 - Forks: 2

smart-data-lake/smart-data-lake

Smart Automation Tool for building modern Data Lakes and Data Pipelines

Language: Scala - Size: 36.2 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 92 - Forks: 21

japila-books/delta-lake-internals

The Internals of Delta Lake

Size: 168 MB - Last synced: 15 days ago - Pushed: about 2 months ago - Stars: 175 - Forks: 36

newfront/hitchhikers_guide_to_deltalake_streaming

Don't Panic. This guide will help you when it feels like the end of the world.

Language: Jupyter Notebook - Size: 89.8 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 16 - Forks: 4

ognis1205/delta-hub-ts

A platform and cloud-based service for data sharing based on Delta Sharing implemented using Next.js and TypeScript.

Language: TypeScript - Size: 5.78 MB - Last synced: 30 days ago - Pushed: 5 months ago - Stars: 20 - Forks: 3

ismailhammounou/db2ixf

db2ixf is a python package with a CLI that simplifies the parsing and processing of IBM Integration eXchange Format (IXF) files.

Language: Python - Size: 1 MB - Last synced: 30 days ago - Pushed: 2 months ago - Stars: 14 - Forks: 1

reisdebora/awesome-databricks

A curated list of awesome Databricks resources, including Spark

Size: 27.3 KB - Last synced: 3 days ago - Pushed: over 2 years ago - Stars: 14 - Forks: 2

yandex-cloud/yc-delta

Delta Lake для Yandex Data Proc

Language: Java - Size: 119 KB - Last synced: 26 days ago - Pushed: 7 months ago - Stars: 3 - Forks: 1

bhavink/databricks

Databricks Platform - Architecture, Security, Automation and much more!!

Language: Jupyter Notebook - Size: 13.9 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 44 - Forks: 30

izhangzhihao/Real-time-Data-Warehouse

Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi

Language: Dockerfile - Size: 106 KB - Last synced: 2 months ago - Pushed: 5 months ago - Stars: 95 - Forks: 40

goodwillpunning/nodejs-sharing-client

A Node.js connector for Delta Sharing.

Language: JavaScript - Size: 419 KB - Last synced: 17 days ago - Pushed: 11 months ago - Stars: 9 - Forks: 4

leehuwuj/olh

Open source stack lakehouse

Language: Python - Size: 4.57 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 22 - Forks: 2

data-engineer-course/taxacco

Проект № 4 для курса "Инженер данных".

Language: Jupyter Notebook - Size: 11.5 MB - Last synced: 4 months ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

dacort/faker-cli

Command-line interface to quickly generate fake CSV and JSON data

Language: Python - Size: 36.1 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 63 - Forks: 4

delta-io/kafka-delta-ingest

A highly efficient daemon for streaming data from Kafka into Delta Lake

Language: Rust - Size: 1.8 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 289 - Forks: 54

DataTech-Solutions/Threat-Detection-and-Visualization

Threat Detection and Visualization

Language: TSQL - Size: 11.9 MB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 25 - Forks: 153

herry13/glue-docker-image

A custom Glue Docker image

Language: Dockerfile - Size: 2.93 KB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 0 - Forks: 0

jcguidry/flight-ml-preprocess-gcp

Continuous flight event data processing using Spark Streaming, Delta Lake storage, deployed on GCP dataproc cluster.

Language: Python - Size: 13.7 KB - Last synced: 23 days ago - Pushed: 9 months ago - Stars: 0 - Forks: 0

cmackenzie1/deltalake-go

An implementation of Delta Lake in Go

Language: Go - Size: 78.1 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 2 - Forks: 0

JayyShah/Databricks-AWS

Databricks provides a unified, open platform for all your data. It empowers data scientists, data engineers and data analysts with a simple collaborative environment to run interactive and scheduled data analysis workloads.

Language: Python - Size: 3.91 KB - Last synced: 10 months ago - Pushed: over 1 year ago - Stars: 1 - Forks: 0

aws-samples/amazon-emr-with-delta-lake

Amazon EMR Notebook to show how to read from and write to Delta tables with Amazon EMR

Language: Jupyter Notebook - Size: 343 KB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 17 - Forks: 12

satyakommula96/spark_benchmark

Spark Performance Benchmark suite to evaluate all TPC-DS and TPC-H query times

Language: Scala - Size: 97.7 KB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 3 - Forks: 2

LeoneGarage/StreamJoin

A framework for incremental streaming joins and incremental streaming aggregations over change data feeds from Databricks Delta

Language: Python - Size: 163 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 1 - Forks: 0

taka-yayoi/public_repo

Databricksのサンプルノートブックを格納しています。

Language: Python - Size: 43.9 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 8 - Forks: 7

buoyant-data/lambda-delta-optimize

AWS Lambda function for optimizing Delta tables

Language: HCL - Size: 64.5 KB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

sebastianruizm/demo-data-pipeline Fork of lbodnarin/data-pipeline

Simple data pipeline (Airflow + Spark)

Language: Python - Size: 7.61 MB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 2 - Forks: 0

sankamuk/PysparkCheatsheet

PySpark Cheatsheet

Language: Python - Size: 11.2 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 31 - Forks: 25

aravinthsci/Spark_Delta_Lake

Delta Lake Examples

Language: Jupyter Notebook - Size: 285 KB - Last synced: 3 months ago - Pushed: about 4 years ago - Stars: 12 - Forks: 12

martandsingh/ApacheSpark

This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.

Language: Python - Size: 141 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 71 - Forks: 47

anneglienke/101_upsert-delta

This repository exemplifies a simple ELT process using delta to perform upsert and remove data files that aren't in the latest state of the transaction log for the table.

Language: Python - Size: 1.17 MB - Last synced: about 1 year ago - Pushed: about 2 years ago - Stars: 31 - Forks: 2

roeap/flight-fusion

Language: Rust - Size: 3.96 MB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 5 - Forks: 1

credimi/pandora

Relational tables from nested data

Language: Scala - Size: 32.2 KB - Last synced: 10 months ago - Pushed: over 1 year ago - Stars: 2 - Forks: 1

OpenTableFormat/OpenTableFormat.github.io

Website for open table format 🕸

Language: CSS - Size: 4.59 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

ev2900/EMR_Studio_Delta_Lake

Deltalake examples designed to be run on AWS Elastic Map Reduce (EMR) via. EMR Studio or EMR Notebooks

Language: Jupyter Notebook - Size: 12.7 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1 - Forks: 1

himewel/ifood-data

Ifood data wrangling with Apache Airflow and Apache Spark running on Kubernetes

Language: Python - Size: 396 KB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 1 - Forks: 1

jasondavindev/delta-lake-dms-cdc

Example application for DMS CDC with Delta Lake and Apache Hudi

Language: Python - Size: 69.5 MB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 1 - Forks: 1

vvalcristina/treinamento-dataproc-deltalake

Ambiente de treinamento para Dataproc e DeltaLake

Language: Jupyter Notebook - Size: 664 KB - Last synced: about 1 year ago - Pushed: almost 3 years ago - Stars: 1 - Forks: 1