GitHub topics: data-lakehouse

Repositories

mahmoudparsian/data-warehousing

This repository is a place for the Data Warehousing course at the Information Systems & Analytics department, Santa Clara University.

Language: Jupyter Notebook - Size: 538 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 9 - Forks: 2

sonikq/architecture-sprint-11

Transition from DWH to domain services and Data Mart

Size: 448 KB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

laminlabs/lamindb

A data framework for biology.

Language: Python - Size: 8.1 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 169 - Forks: 15

DataWithBaraa/sql-data-warehouse-project

A comprehensive guide to building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.

Language: TSQL - Size: 20.5 MB - Last synced at: 11 days ago - Pushed at: about 1 month ago - Stars: 170 - Forks: 137

BemiHQ/BemiDB

Single-binary Postgres read replica optimized for analytics

Language: Go - Size: 5.16 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 1,371 - Forks: 32

PFund-Software-Ltd/pfeed

Data Engine for Manual/Algo Trading: Download/Stream -> Clean -> Store. Supports Data Lakehouse Architecture. Clean Once and Forget.

Language: Python - Size: 3.6 MB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 25 - Forks: 5

ExHansen/data-warehouse-project

Data Engineering Project

Language: TSQL - Size: 15.3 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

ulbmuenster/dataasee

DatAasee - A Metadata-Lake for Libraries

Language: Makefile - Size: 3.06 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 14 - Forks: 2

rogui-manal/SQL-DATA-WAREHOUSE-PROJECT-FROM-SCRATCH

Building a modern Data Warehouse with SQL Server, including ETL processes, Data Modeling and analytics

Language: TSQL - Size: 981 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

yrehim7/data_warehouse_project

A complete, easy-to-follow guide on building a modern data warehouse with SQL Server. Learn how to design ETL processes, create effective data models, and leverage analytics for better insights.

Language: TSQL - Size: 1.54 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

🚀 Scalable near-real-time data pipeline using Apache Iceberg, Spark, Kafka, and Trino. ACID-compliant JSON ingestion, processing, and analytics. Dockerized for easy deployment. #DataEngineering #DataLake

Language: Python - Size: 233 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

aabouzaid/modern-data-platform-poc

My M.Sc. dissertation: Modern Data Platform using DataOps, Kubernetes, and Cloud-Native ecosystem to build a resilient Big Data platform based on Data Lakehouse architecture which is the base for Machine Learning (MLOps) and Artificial Intelligence (AIOps).

Language: Jupyter Notebook - Size: 5.52 MB - Last synced at: 29 days ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 1

pracdata/awesome-open-source-data-engineering

A curated list of open source tools used in analytics platforms and data engineering ecosystem

Size: 219 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 274 - Forks: 29

shinie19/sql-data-warehouse-project

Build a modern Data Warehouse from scratch with SQL Server, including ETL processes, data modeling and analytics.

Language: TSQL - Size: 677 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Rudra-G-23/SQL-Data-Warehouse-Project

This repo provides a step-by-step approach to building a modern data warehouse using PostgreSQL. It covers the ETL (Extract, Transform, Load) process, data modeling, exploratory data analysis (EDA), and advanced data analysis techniques.

Language: PLpgSQL - Size: 9.32 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

nxion/sql-data-warehouse-project

Building a modern data warehouse with MS SQL server, ETL processes, data modeling and analyitics.

Size: 806 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Mariann95/SQL_Data_Warehouse_And_Analytics_Project

Building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics. This repository also contains a collection of SQL scripts demonstrating various analytical techniques, such as changes over time, cumulative, performance, data segmentation, part-to-whole analysis.

Language: TSQL - Size: 2.45 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

irwandifo/gcp-batch-infra

GCP Infrastructure for Batch Processing

Language: HCL - Size: 1.39 MB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

abeltavares/versioned-data-lakehouse

🌊 Git-like Version Control for Data with Nessie, Iceberg, and Spark

Language: Jupyter Notebook - Size: 3.95 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 3 - Forks: 2

huwngnosleep/complete_lakehouse_techstack

This project implements an end-to-end techstack for a data platform, for local development.

Language: Python - Size: 116 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 2 - Forks: 0

Qbeast-io/qbeast-spark

Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!

Language: Scala - Size: 37.4 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 222 - Forks: 20

dzaky-pr/ets-datalakehouse-b

Language: Jupyter Notebook - Size: 642 KB - Last synced at: 2 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

prneidhardt/AWS-Data-Lakehouse

STEDI project

Language: Python - Size: 959 KB - Last synced at: 7 days ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

sudohainguyen/mini-lakehouse

Data lakehouse at home with docker compose

Language: Jupyter Notebook - Size: 531 KB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

Cris-Neumann/Data-Lakehouse-with-Amazon-S3-and-Redshift

Canalización desde MongoDB hacia un Data Lake de Amazon S3, creación de Data Warehouse en Amazon Redshift y visualización en Tableau.

Language: Python - Size: 237 KB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

k0rsakov/infrastructure_for_data_engineer_S3

Инфраструктура для data engineer S3

Language: Python - Size: 11.7 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

k0rsakov/all_about_DuckDB

Всё что нужно знать про DuckDB

Language: Jupyter Notebook - Size: 33.2 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

THeades/serverless-data-lakehouse

This is an example project how to build a serverless data lakehouse on AWS using Terraform, Apache Iceberg and Spark.

Size: 5.86 KB - Last synced at: 2 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Data-Kube/tst-datalakehouse-hudi

#Test - Create a Data Lakehouse in Kubernetes

Size: 155 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

gupta-aayushkr/F1-Racing

The project aims to process Formula 1 racing data, create an automated data pipeline, and make the data available for presentation and analysis purposes.

Language: Python - Size: 5.04 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

dominikhei/Local-Data-LakeHouse

Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.

Language: Dockerfile - Size: 127 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 24 - Forks: 6

eavilaes/qbeast-spark Fork of Qbeast-io/qbeast-spark