GitHub topics: data-lakehouse
mahmoudparsian/data-warehousing
This repository is a place for the Data Warehousing course at the Information Systems & Analytics department, Santa Clara University.
Language: Jupyter Notebook - Size: 538 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 9 - Forks: 2

sonikq/architecture-sprint-11
Transition from DWH to domain services and Data Mart
Size: 448 KB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

laminlabs/lamindb
A data framework for biology.
Language: Python - Size: 8.1 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 169 - Forks: 15

DataWithBaraa/sql-data-warehouse-project
A comprehensive guide to building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.
Language: TSQL - Size: 20.5 MB - Last synced at: 11 days ago - Pushed at: about 1 month ago - Stars: 170 - Forks: 137

BemiHQ/BemiDB
Single-binary Postgres read replica optimized for analytics
Language: Go - Size: 5.16 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 1,371 - Forks: 32

PFund-Software-Ltd/pfeed
Data Engine for Manual/Algo Trading: Download/Stream -> Clean -> Store. Supports Data Lakehouse Architecture. Clean Once and Forget.
Language: Python - Size: 3.6 MB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 25 - Forks: 5

ExHansen/data-warehouse-project
Data Engineering Project
Language: TSQL - Size: 15.3 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

ulbmuenster/dataasee
DatAasee - A Metadata-Lake for Libraries
Language: Makefile - Size: 3.06 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 14 - Forks: 2

rogui-manal/SQL-DATA-WAREHOUSE-PROJECT-FROM-SCRATCH
Building a modern Data Warehouse with SQL Server, including ETL processes, Data Modeling and analytics
Language: TSQL - Size: 981 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

yrehim7/data_warehouse_project
A complete, easy-to-follow guide on building a modern data warehouse with SQL Server. Learn how to design ETL processes, create effective data models, and leverage analytics for better insights.
Language: TSQL - Size: 1.54 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Elkoumy/real_time_data_lake
🚀 Scalable near-real-time data pipeline using Apache Iceberg, Spark, Kafka, and Trino. ACID-compliant JSON ingestion, processing, and analytics. Dockerized for easy deployment. #DataEngineering #DataLake
Language: Python - Size: 233 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

aabouzaid/modern-data-platform-poc
My M.Sc. dissertation: Modern Data Platform using DataOps, Kubernetes, and Cloud-Native ecosystem to build a resilient Big Data platform based on Data Lakehouse architecture which is the base for Machine Learning (MLOps) and Artificial Intelligence (AIOps).
Language: Jupyter Notebook - Size: 5.52 MB - Last synced at: 29 days ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 1

pracdata/awesome-open-source-data-engineering
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Size: 219 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 274 - Forks: 29

shinie19/sql-data-warehouse-project
Build a modern Data Warehouse from scratch with SQL Server, including ETL processes, data modeling and analytics.
Language: TSQL - Size: 677 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Rudra-G-23/SQL-Data-Warehouse-Project
This repo provides a step-by-step approach to building a modern data warehouse using PostgreSQL. It covers the ETL (Extract, Transform, Load) process, data modeling, exploratory data analysis (EDA), and advanced data analysis techniques.
Language: PLpgSQL - Size: 9.32 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

nxion/sql-data-warehouse-project
Building a modern data warehouse with MS SQL server, ETL processes, data modeling and analyitics.
Size: 806 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Mariann95/SQL_Data_Warehouse_And_Analytics_Project
Building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics. This repository also contains a collection of SQL scripts demonstrating various analytical techniques, such as changes over time, cumulative, performance, data segmentation, part-to-whole analysis.
Language: TSQL - Size: 2.45 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

irwandifo/gcp-batch-infra
GCP Infrastructure for Batch Processing
Language: HCL - Size: 1.39 MB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

abeltavares/versioned-data-lakehouse
🌊 Git-like Version Control for Data with Nessie, Iceberg, and Spark
Language: Jupyter Notebook - Size: 3.95 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 3 - Forks: 2

huwngnosleep/complete_lakehouse_techstack
This project implements an end-to-end techstack for a data platform, for local development.
Language: Python - Size: 116 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 2 - Forks: 0

Qbeast-io/qbeast-spark
Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
Language: Scala - Size: 37.4 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 222 - Forks: 20

dzaky-pr/ets-datalakehouse-b
Language: Jupyter Notebook - Size: 642 KB - Last synced at: 2 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

prneidhardt/AWS-Data-Lakehouse
STEDI project
Language: Python - Size: 959 KB - Last synced at: 7 days ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

sudohainguyen/mini-lakehouse
Data lakehouse at home with docker compose
Language: Jupyter Notebook - Size: 531 KB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

Cris-Neumann/Data-Lakehouse-with-Amazon-S3-and-Redshift
Canalización desde MongoDB hacia un Data Lake de Amazon S3, creación de Data Warehouse en Amazon Redshift y visualización en Tableau.
Language: Python - Size: 237 KB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

k0rsakov/infrastructure_for_data_engineer_S3
Инфраструктура для data engineer S3
Language: Python - Size: 11.7 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

k0rsakov/all_about_DuckDB
Всё что нужно знать про DuckDB
Language: Jupyter Notebook - Size: 33.2 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

THeades/serverless-data-lakehouse
This is an example project how to build a serverless data lakehouse on AWS using Terraform, Apache Iceberg and Spark.
Size: 5.86 KB - Last synced at: 2 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Data-Kube/tst-datalakehouse-hudi
#Test - Create a Data Lakehouse in Kubernetes
Size: 155 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

gupta-aayushkr/F1-Racing
The project aims to process Formula 1 racing data, create an automated data pipeline, and make the data available for presentation and analysis purposes.
Language: Python - Size: 5.04 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

dominikhei/Local-Data-LakeHouse
Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.
Language: Dockerfile - Size: 127 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 24 - Forks: 6

eavilaes/qbeast-spark Fork of Qbeast-io/qbeast-spark
Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
Language: Scala - Size: 16.9 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0
