Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: data-lakehouse

huwngnosleep/complete_lakehouse_techstack

This project implements an end-to-end techstack for a data platform, can be used on production.

Language: Python - Size: 39.3 MB - Last synced: about 5 hours ago - Pushed: about 6 hours ago - Stars: 0 - Forks: 0

Qbeast-io/qbeast-spark

Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!

Language: Scala - Size: 36.6 MB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 199 - Forks: 17

aabouzaid/modern-data-platform-poc

My M.Sc. dissertation: Modern Data Platform using DataOps, Kubernetes, and Cloud-Native ecosystem to build a resilient Big Data platform based on Data Lakehouse architecture which is the base for Machine Learning (MLOps) and Artificial Intelligence (AIOps).

Language: Jupyter Notebook - Size: 5.52 MB - Last synced: 20 days ago - Pushed: 20 days ago - Stars: 4 - Forks: 1

Data-Kube/tst-datalakehouse-hudi

#Test - Create a Data Lakehouse in Kubernetes

Size: 85.9 KB - Last synced: 10 days ago - Pushed: 10 days ago - Stars: 0 - Forks: 0

mahmoudparsian/data-warehousing

This repository is a place for the Data Warehousing course at the Information Systems & Analytics department, Santa Clara University.

Language: HTML - Size: 167 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 5 - Forks: 1

pracdata/awesome-open-source-data-engineering

A curated list of open source tools used in analytical stacks and data engineering ecosystem

Size: 43.9 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 19 - Forks: 1

gupta-aayushkr/F1-Racing

The project aims to process Formula 1 racing data, create an automated data pipeline, and make the data available for presentation and analysis purposes.

Language: Python - Size: 5.04 MB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 1 - Forks: 0

dominikhei/Local-Data-LakeHouse

Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.

Language: Dockerfile - Size: 127 MB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 24 - Forks: 6

prneidhardt/AWS-Data-Lakehouse

STEDI project

Language: Python - Size: 950 KB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0

sudohainguyen/mini-lakehouse

Data lakehouse at home with k8s and helm

Language: Jupyter Notebook - Size: 530 KB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

eavilaes/qbeast-spark Fork of Qbeast-io/qbeast-spark

Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!

Language: Scala - Size: 16.9 MB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0