Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: deltalake
buoyant-data/oxbow
Collection of AWS Lambdas for creating and managing Delta tables
Language: Rust - Size: 194 KB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 7 - Forks: 4
WeBankFinTech/Streamis
Streaming application development and management system, based on Linkis and DSS, planning to provide the workflow-like graphical drag-and-drop development capability.
Language: Java - Size: 70 MB - Last synced: 5 days ago - Pushed: about 1 month ago - Stars: 97 - Forks: 40
databrickslabs/dbldatagen
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
Language: Python - Size: 10 MB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 267 - Forks: 51
naiborhujosua/Data-Scientist-learning-path-using-databricks
This is the summary of learning Data Science using Databricks
Size: 51.8 KB - Last synced: 15 days ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0
MrPowers/mack
Delta Lake helper methods in PySpark
Language: Python - Size: 2.8 MB - Last synced: 17 days ago - Pushed: 3 months ago - Stars: 271 - Forks: 39
easonlai/databricks_delta_table_samples
This is a code sample repository for demonstrating how to perform Databricks Delta Table operations.
Language: HTML - Size: 23.9 MB - Last synced: 19 days ago - Pushed: almost 2 years ago - Stars: 2 - Forks: 1
palutz/rust_nextstep
A series of exercises to play with more advanced topics in Rust
Language: Rust - Size: 298 KB - Last synced: 25 days ago - Pushed: 25 days ago - Stars: 0 - Forks: 0
makism/datastack-playground
A datastack playground; includes Spark, Kafka, Airbyte, etc.
Language: Jupyter Notebook - Size: 55.7 KB - Last synced: 28 days ago - Pushed: 7 months ago - Stars: 0 - Forks: 0
bmsuisse/lakeapi
API for distributing Data Lake Data
Language: Python - Size: 14.8 MB - Last synced: 28 days ago - Pushed: 28 days ago - Stars: 6 - Forks: 2
smart-data-lake/smart-data-lake
Smart Automation Tool for building modern Data Lakes and Data Pipelines
Language: Scala - Size: 36.2 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 92 - Forks: 21
japila-books/delta-lake-internals
The Internals of Delta Lake
Size: 168 MB - Last synced: 15 days ago - Pushed: about 2 months ago - Stars: 175 - Forks: 36
newfront/hitchhikers_guide_to_deltalake_streaming
Don't Panic. This guide will help you when it feels like the end of the world.
Language: Jupyter Notebook - Size: 89.8 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 16 - Forks: 4
ognis1205/delta-hub-ts
A platform and cloud-based service for data sharing based on Delta Sharing implemented using Next.js and TypeScript.
Language: TypeScript - Size: 5.78 MB - Last synced: 30 days ago - Pushed: 5 months ago - Stars: 20 - Forks: 3
ismailhammounou/db2ixf
db2ixf is a python package with a CLI that simplifies the parsing and processing of IBM Integration eXchange Format (IXF) files.
Language: Python - Size: 1 MB - Last synced: 30 days ago - Pushed: 2 months ago - Stars: 14 - Forks: 1
reisdebora/awesome-databricks
A curated list of awesome Databricks resources, including Spark
Size: 27.3 KB - Last synced: 3 days ago - Pushed: over 2 years ago - Stars: 14 - Forks: 2
yandex-cloud/yc-delta
Delta Lake для Yandex Data Proc
Language: Java - Size: 119 KB - Last synced: 26 days ago - Pushed: 7 months ago - Stars: 3 - Forks: 1
bhavink/databricks
Databricks Platform - Architecture, Security, Automation and much more!!
Language: Jupyter Notebook - Size: 13.9 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 44 - Forks: 30
izhangzhihao/Real-time-Data-Warehouse
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
Language: Dockerfile - Size: 106 KB - Last synced: 2 months ago - Pushed: 5 months ago - Stars: 95 - Forks: 40
goodwillpunning/nodejs-sharing-client
A Node.js connector for Delta Sharing.
Language: JavaScript - Size: 419 KB - Last synced: 17 days ago - Pushed: 11 months ago - Stars: 9 - Forks: 4
leehuwuj/olh
Open source stack lakehouse
Language: Python - Size: 4.57 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 22 - Forks: 2
data-engineer-course/taxacco
Проект № 4 для курса "Инженер данных".
Language: Jupyter Notebook - Size: 11.5 MB - Last synced: 4 months ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0
dacort/faker-cli
Command-line interface to quickly generate fake CSV and JSON data
Language: Python - Size: 36.1 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 63 - Forks: 4
delta-io/kafka-delta-ingest
A highly efficient daemon for streaming data from Kafka into Delta Lake
Language: Rust - Size: 1.8 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 289 - Forks: 54
DataTech-Solutions/Threat-Detection-and-Visualization
Threat Detection and Visualization
Language: TSQL - Size: 11.9 MB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 25 - Forks: 153
herry13/glue-docker-image
A custom Glue Docker image
Language: Dockerfile - Size: 2.93 KB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 0 - Forks: 0
jcguidry/flight-ml-preprocess-gcp
Continuous flight event data processing using Spark Streaming, Delta Lake storage, deployed on GCP dataproc cluster.
Language: Python - Size: 13.7 KB - Last synced: 23 days ago - Pushed: 9 months ago - Stars: 0 - Forks: 0
cmackenzie1/deltalake-go
An implementation of Delta Lake in Go
Language: Go - Size: 78.1 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 2 - Forks: 0
JayyShah/Databricks-AWS
Databricks provides a unified, open platform for all your data. It empowers data scientists, data engineers and data analysts with a simple collaborative environment to run interactive and scheduled data analysis workloads.
Language: Python - Size: 3.91 KB - Last synced: 10 months ago - Pushed: over 1 year ago - Stars: 1 - Forks: 0
aws-samples/amazon-emr-with-delta-lake
Amazon EMR Notebook to show how to read from and write to Delta tables with Amazon EMR
Language: Jupyter Notebook - Size: 343 KB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 17 - Forks: 12
satyakommula96/spark_benchmark
Spark Performance Benchmark suite to evaluate all TPC-DS and TPC-H query times
Language: Scala - Size: 97.7 KB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 3 - Forks: 2
LeoneGarage/StreamJoin
A framework for incremental streaming joins and incremental streaming aggregations over change data feeds from Databricks Delta
Language: Python - Size: 163 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 1 - Forks: 0
taka-yayoi/public_repo
Databricksのサンプルノートブックを格納しています。
Language: Python - Size: 43.9 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 8 - Forks: 7
buoyant-data/lambda-delta-optimize
AWS Lambda function for optimizing Delta tables
Language: HCL - Size: 64.5 KB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0
sebastianruizm/demo-data-pipeline Fork of lbodnarin/data-pipeline
Simple data pipeline (Airflow + Spark)
Language: Python - Size: 7.61 MB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 2 - Forks: 0
sankamuk/PysparkCheatsheet
PySpark Cheatsheet
Language: Python - Size: 11.2 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 31 - Forks: 25
aravinthsci/Spark_Delta_Lake
Delta Lake Examples
Language: Jupyter Notebook - Size: 285 KB - Last synced: 3 months ago - Pushed: about 4 years ago - Stars: 12 - Forks: 12
martandsingh/ApacheSpark
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Language: Python - Size: 141 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 71 - Forks: 47
anneglienke/101_upsert-delta
This repository exemplifies a simple ELT process using delta to perform upsert and remove data files that aren't in the latest state of the transaction log for the table.
Language: Python - Size: 1.17 MB - Last synced: about 1 year ago - Pushed: about 2 years ago - Stars: 31 - Forks: 2
roeap/flight-fusion
Language: Rust - Size: 3.96 MB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 5 - Forks: 1
credimi/pandora
Relational tables from nested data
Language: Scala - Size: 32.2 KB - Last synced: 10 months ago - Pushed: over 1 year ago - Stars: 2 - Forks: 1
OpenTableFormat/OpenTableFormat.github.io
Website for open table format 🕸
Language: CSS - Size: 4.59 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
ev2900/EMR_Studio_Delta_Lake
Deltalake examples designed to be run on AWS Elastic Map Reduce (EMR) via. EMR Studio or EMR Notebooks
Language: Jupyter Notebook - Size: 12.7 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1 - Forks: 1
himewel/ifood-data
Ifood data wrangling with Apache Airflow and Apache Spark running on Kubernetes
Language: Python - Size: 396 KB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 1 - Forks: 1
jasondavindev/delta-lake-dms-cdc
Example application for DMS CDC with Delta Lake and Apache Hudi
Language: Python - Size: 69.5 MB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 1 - Forks: 1
vvalcristina/treinamento-dataproc-deltalake
Ambiente de treinamento para Dataproc e DeltaLake
Language: Jupyter Notebook - Size: 664 KB - Last synced: about 1 year ago - Pushed: almost 3 years ago - Stars: 1 - Forks: 1