GitHub topics: datalake
karo23361/sql-data-warehouse-project
Data Warehouse Project
Language: SQL - Size: 887 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

gigapi/gigapi-querier
DuckDB Query Engine for GigAPI
Language: Go - Size: 188 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 6 - Forks: 0

lakesoul-io/LakeSoul
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Language: Java - Size: 35.2 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 2,739 - Forks: 405

apache/doris-website
Apache Doris Website
Language: TypeScript - Size: 429 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 95 - Forks: 312

100-rab/AMO
[RSS 2025] AMO: Adaptive Motion Optimization for Hyper-Dexterous Humanoid Whole-Body Control
Language: Python - Size: 44.5 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

turtacn/dataseap
DataSeap:An open source unified data foundation for data intensive business powered by generative AI
Size: 4.88 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

apache/amoro
Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
Language: Java - Size: 67.8 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 960 - Forks: 327

buoyant-data/oxbow
Collection of AWS Lambdas for creating and managing Delta tables
Language: Rust - Size: 288 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 34 - Forks: 10

GEdnieLockett/DataBricks
Exploration of DataBrick SQL Servers and AI generated dashboarding
Size: 138 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

apache/gravitino
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
Language: Java - Size: 47.9 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,470 - Forks: 450

zinggAI/zingg
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Language: Java - Size: 679 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,023 - Forks: 125

dilermando-lima/trino-pg-mysql-s3-parquet
trino cluster collecting data from mysql and postgress process them and save into s3 as parquet
Language: Python - Size: 70.3 KB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

treeverse/lakeFS
lakeFS - Data version control for your data lake | Git for data
Language: Go - Size: 149 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 4,651 - Forks: 373

prestodb/prestorials
Tutorials and examples of how to deploy Presto and connect it to different data sources
Size: 1.11 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 20 - Forks: 15

activeloopai/deeplake
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
Language: Python - Size: 65.3 MB - Last synced at: 5 days ago - Pushed at: 18 days ago - Stars: 8,589 - Forks: 658

PaloAltoNetworks/pan-cortex-data-lake-python 📦
Python idiomatic SDK for Cortex™ Data Lake.
Language: Python - Size: 1.28 MB - Last synced at: 5 days ago - Pushed at: about 2 months ago - Stars: 45 - Forks: 21

jorgevillegas18/etl-postgres-to-starrocks-via-risingwave
This repository provides a modular and easy-to-extend ETL pipeline that streams data from a PostgreSQL database into a StarRocks data warehouse using RisingWave as the real-time streaming computation layer.
Size: 13.7 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

apache/hudi
Upserts, Deletes And Incremental Processing on Big Data.
Language: Java - Size: 1.74 GB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 5,756 - Forks: 2,396

ExpediaGroup/apiary-data-lake
Terraform scripts for deploying Apiary Data Lake
Language: HCL - Size: 741 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 19 - Forks: 30

sinaptik-ai/pandas-ai
Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.
Language: Python - Size: 54.4 MB - Last synced at: 6 days ago - Pushed at: 27 days ago - Stars: 19,924 - Forks: 1,884

trinodb/trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Language: Java - Size: 259 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 11,243 - Forks: 3,191

StarRocks/starrocks
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
Language: Java - Size: 473 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 9,951 - Forks: 1,982

samber/awesome-olap
A curated list of awesome Online Analytical Processing databases, frameworks, ressources and other awesomeness.
Size: 49.8 KB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 69 - Forks: 6

nimtable/nimtable
The Control Plane for Apache Iceberg
Language: TypeScript - Size: 3.87 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 39 - Forks: 4

hyparam/icebird
Icebird: JavaScript Iceberg Client
Language: JavaScript - Size: 224 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 23 - Forks: 0

manuzhang/awesome-lakehouse
a curated list of awesome lakehouse frameworks, applications, etc
Size: 41 KB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 27 - Forks: 4

MaciekLesiczka/bazof
Lakehouse with time travel
Language: Rust - Size: 47.2 MB - Last synced at: about 18 hours ago - Pushed at: about 18 hours ago - Stars: 0 - Forks: 0

prefeitura-rio/queries-rj-sms
Projeto dbt do Data Lake da Secretaria Municipal de Saúde
Language: PowerShell - Size: 6.17 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 4 - Forks: 0

JinsYin/awesome-datalake
📚 Awesome DataLake | 数据湖大全
Size: 11.7 KB - Last synced at: 6 days ago - Pushed at: 7 months ago - Stars: 3 - Forks: 0

rogui-manal/SQL-DATA-WAREHOUSE-PROJECT-FROM-SCRATCH
Building a modern Data Warehouse with SQL Server, including ETL processes, Data Modeling and analytics
Language: TSQL - Size: 981 KB - Last synced at: 9 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

Datavault-UK/automate-dv
A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
Size: 8.32 MB - Last synced at: 9 days ago - Pushed at: about 2 months ago - Stars: 538 - Forks: 136

linkedin/openhouse
Open Control Plane for Tables in Data Lakehouse
Language: Java - Size: 6.35 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 343 - Forks: 55

vre-hub/vre
VRE infrastructure running at CERN
Language: Shell - Size: 13.4 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 8 - Forks: 2

WeBankFinTech/Streamis
Streaming application development and management system, based on Linkis and DSS, planning to provide the workflow-like graphical drag-and-drop development capability.
Language: Java - Size: 72.2 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 107 - Forks: 44

prefeitura-rio/pipelines_rj_sms
Pipelines de dados da Secretaria Municipal de Saúde
Language: Python - Size: 3.46 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 4 - Forks: 0

DataLinkDC/dinky
Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.
Language: Java - Size: 36.4 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 3,395 - Forks: 1,230

DataWithBaraa/sql-data-warehouse-project
A comprehensive guide to building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.
Language: TSQL - Size: 20.5 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 124 - Forks: 111

aws-solutions-library-samples/aws-insurancelake-etl
This solution helps you deploy ETL processes and data storage resources to create an Insurance Lake using Amazon S3 buckets for storage, AWS Glue for data transformation, and AWS CDK Pipelines. It is originally based on the AWS blog Deploy data lake ETL jobs using CDK Pipelines, and complements the InsuranceLake Infrastructure project
Language: Python - Size: 8.94 MB - Last synced at: 17 days ago - Pushed at: 6 months ago - Stars: 26 - Forks: 12

apache/doris-thirdparty
Self-managed thirdparty dependencies for Apache Doris
Size: 515 MB - Last synced at: about 12 hours ago - Pushed at: 2 days ago - Stars: 37 - Forks: 43

leesf/hudi-resources
汇总Apache Hudi相关资料
Size: 23.7 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 550 - Forks: 160

Noobzik/ATL-Datamart
TP d'architecture décisionnel à destination des étudiants de l'EPSI et DC Paris. Le but est de déployer une architecture data dès la récupération de la donnée vers la restitution sous la forme de dataviz en passant par un Datalake, Data Warehouse et d'un Data Mart
Language: Python - Size: 465 KB - Last synced at: 24 days ago - Pushed at: 25 days ago - Stars: 4 - Forks: 103

awslabs/aws-orbit-workbench 📦
A Data Platform built for AWS, powered by Kubernetes.
Language: Python - Size: 53.7 MB - Last synced at: 1 day ago - Pushed at: almost 2 years ago - Stars: 148 - Forks: 92

anquev/minilake
A lightweight Python data lake solution with Delta Lake and S3 support. Simple storage, ingestion, and DuckDB-powered querying for data workflows.
Language: Python - Size: 7.39 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 1 - Forks: 0

deddyandri/tokyo-olympic-azure-data-analyst-project
tokyo-olympic-azure-data-analyst and engineering-project
Language: Jupyter Notebook - Size: 627 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Saikesana31/Netflix
Azure Data engineering project
Language: Python - Size: 1.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

izhangzhihao/Real-time-Data-Warehouse
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
Language: Dockerfile - Size: 106 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 113 - Forks: 44

lynnlangit/serverless-architecture
Companion to my Linked In Learning 'Serverless Architecture' course
Size: 5.77 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 19 - Forks: 8

paradedb/pg_analytics 📦
DuckDB-powered data lake analytics from Postgres
Language: Rust - Size: 814 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 522 - Forks: 21

jblukach/parquet2csv
Convert from CSV to Parquet and back again!
Language: Rust - Size: 6.84 KB - Last synced at: 9 days ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

LearningJournal/SparkProgrammingInScala
Apache Spark Course Material
Language: Scala - Size: 50.9 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 88 - Forks: 159

leo-project/leofs
The LeoFS Storage System
Language: Erlang - Size: 30 MB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 1,563 - Forks: 155

ismailsimsek/iceberg-examples
Apache iceberg Spark s3 examples
Language: Java - Size: 33.2 KB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 20 - Forks: 9

lucashomuniz/Project-05
DATA ENGINEERING FOR OLYMPICS USING AZURE, SQL AND PBI
Language: Jupyter Notebook - Size: 1.38 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

transferia/iceberg
Transferia iceberg provider
Language: Go - Size: 134 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Databricks-BR/open_tax
Lakehouse Tributário, para apoio gerencial aos processos fiscais, visando a melhoria contínua, identificação de falhas (Tax Compliance), modelos inteligentes de identificação de oportunidades (Tax Intelligence) e democratização das informações fiscais.
Language: Python - Size: 4.55 MB - Last synced at: 21 days ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 0

Saikesana31/Adventure_Works_DE
Azure Data engineering project
Size: 2.26 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

imsanjoykb/ETL-Project
The goal of this project is to illustrate Extract Transform Load (ETL) using Python and SQL. ETL is a process commonly done in computing, which takes raw data, cleans it and stores it for later use. The extraction phase targets and retrieves the data. Transform manipulates and cleans the data. Then load stores the data, typically in a data warehouse.
Language: Jupyter Notebook - Size: 285 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 22 - Forks: 9

hoaihuongbk/lakeops
A modern data lake operations toolkit working with multiple table formats (Delta, Iceberg, Parquet) and engines (Spark, Polars) via the same APIs.
Language: Python - Size: 683 KB - Last synced at: 25 days ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

KennethanCeyer/awesome-data-pipeline
Awesome list for datapipeline
Size: 200 KB - Last synced at: 4 days ago - Pushed at: over 2 years ago - Stars: 34 - Forks: 4

pracdata/awesome-open-source-data-engineering
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Size: 219 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 274 - Forks: 29

SiyaMathe/Modern-Data-Architecture-Concepts
This project aims to provide a comprehensive overview of modern data architecture concepts, including data lakes, data meshes, cloud-based solutions, and real-time processing, and their application in addressing contemporary data challenges.
Size: 8.79 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

soorajpazeekal/logistics-real-time-poc
A Data engineering based Proof of Concept demonstrating cutting-edge logistics solutions for a US-based Grocery Delivery Platform
Language: Jupyter Notebook - Size: 30.3 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 4

nxion/sql-data-warehouse-project
Building a modern data warehouse with MS SQL server, ETL processes, data modeling and analyitics.
Size: 806 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

openEDI/open-data-access-tools
OEDI Data Lake Access
Language: Python - Size: 43.7 MB - Last synced at: 7 days ago - Pushed at: 2 months ago - Stars: 12 - Forks: 10

Mariann95/SQL_Data_Warehouse_And_Analytics_Project
Building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics. This repository also contains a collection of SQL scripts demonstrating various analytical techniques, such as changes over time, cumulative, performance, data segmentation, part-to-whole analysis.
Language: TSQL - Size: 2.45 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

mnpw/mdex
Icberg metadata explorer
Language: Rust - Size: 26.4 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

LearningJournal/Spark-Streaming-In-Scala
Apache Spark 3 - Structured Streaming Course Material
Language: Scala - Size: 19.4 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 45 - Forks: 77

aws-solutions-library-samples/aws-insurancelake-infrastructure
This solution helps you deploy ETL processes and data storage resources to create an Insurance Lake using Amazon S3 buckets for storage, AWS Glue for data transformation, and AWS CDK Pipelines. It is originally based on the AWS blog Deploy data lake ETL jobs using CDK Pipelines, and complements the InsuranceLake ETL with CDK Pipelines project.
Language: Python - Size: 471 KB - Last synced at: 20 days ago - Pushed at: 7 months ago - Stars: 14 - Forks: 7

edgBR/delta-lake-polars
Building a poor man's data lake: Exploring the Power of Polars and Delta Lake
Language: Python - Size: 375 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 7 - Forks: 0

neuro-ml/tarn
An insanely customizable framework for key-value storage 💾
Language: Python - Size: 344 KB - Last synced at: 17 days ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

Carolinerocks/azure-data-engineering-end-to-end-project
Language: Jupyter Notebook - Size: 3.36 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

cuebook/cuelake
Use SQL to build ELT pipelines on a data lakehouse.
Language: JavaScript - Size: 28 MB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 285 - Forks: 28

JohnMata0427/Data-Lake-Case-Studies
Casos de Estudio con Data Lake
Language: Jupyter Notebook - Size: 51 MB - Last synced at: 2 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

DaviMacielCavalcante/desafio2-prof-artemisia
🚀 ETL Challenge: A hands-on project to explore ETL concepts and Data Lake creation in the cloud! Ideal for those who want to understand how to extract, transform, and load data in a scalable environment and integrate it with BI tools for visualization and analysis!
Language: Python - Size: 6.11 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

logleads/LogverzReleases
LOGVERZ APPLICATION BUNDLE: ✔️ Get insights 10x faster ⚡. ✔️ Cut costs by 90% 💰: Slash your data processing and storage expenses. ✔️ Keep your data secure in AWS 🔐—no external transfers. ✔️ Have an all-in-one solution💡: Collect, process, and analyze data without juggling multiple tools. ✔️ Work seamlessly with Power BI, Tableau, and more 📈.
Language: PowerShell - Size: 97.7 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 3 - Forks: 1

jayhan94/MiniLake
A morden mini lakehouse based on Spark and Iceberg running in the docker.
Size: 8.79 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

stonezhong/DataManager
Better organize data in data lake and build ETL pipeline with Web UI tool.
Language: JavaScript - Size: 2.33 MB - Last synced at: 10 days ago - Pushed at: about 4 years ago - Stars: 9 - Forks: 2

japila-books/delta-lake-internals
The Internals of Delta Lake
Size: 191 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 183 - Forks: 36

slowLatency/DE-Apple-Data-Analysis
A Data Pipeline solution using Databricks and Apache Spark to process and analyze Apple data.
Language: Python - Size: 15.6 KB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Rucal-Data-Solutions/datalakefoundation
Datalakehouse Foundation
Language: Scala - Size: 187 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 3 - Forks: 1

chandima2000/Adventure-Works-sales-data-engineering-project
The aim of this project is to build an end-to-end data engineering project using Microsoft Azure
Language: Jupyter Notebook - Size: 6.78 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

tuancamtbtx/dataplatform-stack
How to build a complete Data Platform -> Here
Language: Python - Size: 7.57 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 5 - Forks: 0

lynnlangit/learning-nosql
Companion repository to Linked In Learning course 'Cloud NoSQL for SQL Pros'
Size: 1.01 MB - Last synced at: 6 days ago - Pushed at: 5 months ago - Stars: 4 - Forks: 3

cuiyuheng/hudi Fork of apache/hudi
Upserts, Deletes And Incremental Processing on Big Data.
Size: 1.39 GB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

ewerthonk/datalakehouse-northwind
Creating a Simple Data Lakehouse using Delta Lake on Databricks. My 1st Data Engineering Project.
Language: Jupyter Notebook - Size: 559 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 1

dougdss89/wideworldadventure
This repository includes all files that compose the design and unification of the databases AdventureWorks and WideWorldAdventure project.
Language: Shell - Size: 230 KB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

hussein-awala/gdpr-compliant-lakehouse
This repository is a demonstration of how to handle GDPR export and delete requests in an Iceberg Lakehouse to make it GDPR-compliant.
Language: Jupyter Notebook - Size: 9.77 KB - Last synced at: 7 days ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

fortinux/bigdata-book
Libro Fundamentos de Big Data
Language: Jupyter Notebook - Size: 7.93 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 2 - Forks: 1

KleinYuan/llama2-csv-webapp
self host/local host llama2 based web app to chat with your csvs (multiple)
Language: Python - Size: 168 KB - Last synced at: about 4 hours ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 0

NiranjanRao07/data-226-assignments
This repository includes assignments for DATA 226, focused on designing databases, implementing SQL for analytics, performing ETL operations, building data pipelines, and conducting OLAP.
Language: Jupyter Notebook - Size: 7.6 MB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

divithraju/divith-raju-Immigration-Data-Engineering
A Capstone Project that covers several aspects of Data Engineering (Data Exploration, Cleaning, Modeling, Pipelining, Processing)
Language: Jupyter Notebook - Size: 2.5 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

dd-Splunk/splunk-datalake
How to combine smart store and ingest action for datalake use case
Language: Python - Size: 360 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

richclement/aws-data-lake-sdk 📦
An sdk for the AWS data lake.
Language: JavaScript - Size: 43 KB - Last synced at: about 6 hours ago - Pushed at: almost 8 years ago - Stars: 1 - Forks: 1

AbsaOSS/enceladus
Dynamic Conformance Engine
Language: Scala - Size: 7.94 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 31 - Forks: 14

legout/pydala 📦
Poor mans simple python api for creating a local or remote datalake based on several (pyarrow) datasets using duckdb
Language: Python - Size: 14.1 MB - Last synced at: 1 day ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 1

leonardodrigo/breweries-data-lake
This project builds an Azure Data Lake using the Medallion architecture to process data with Spark from the Open Breweries DB API.
Language: Python - Size: 732 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

laismeuchi/dados-databricks-base-cnpj
Projeto utilizando a base de CNPJ da Receita Federal
Language: Python - Size: 84 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

parthnchoudhury/Enterprise_Data_Architecture
The pragmatic technology journey for an Enterprise Data Model serving reporting, analytical, advanced data science and other digital use cases with integrated data from a variety of sources.
Size: 666 KB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

dbsystel/datalake-graphql-wrapper
The DataLake GraphQL Wrapper provides a GraphQL API for presto/trino.
Language: TypeScript - Size: 294 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 18 - Forks: 0

nataliabeltranarg/NoSQL-DataArchitecture-Spark
Implementing core components of a data-driven architecture using Spark: Data Management and Data Analysis Backbones with structured zones in a data lake and analytical capabilities
Language: Jupyter Notebook - Size: 1.4 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0
