An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: datalake

karo23361/sql-data-warehouse-project

Data Warehouse Project

Language: SQL - Size: 887 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

gigapi/gigapi-querier

DuckDB Query Engine for GigAPI

Language: Go - Size: 188 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 6 - Forks: 0

lakesoul-io/LakeSoul

LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

Language: Java - Size: 35.2 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 2,739 - Forks: 405

apache/doris-website

Apache Doris Website

Language: TypeScript - Size: 429 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 95 - Forks: 312

100-rab/AMO

[RSS 2025] AMO: Adaptive Motion Optimization for Hyper-Dexterous Humanoid Whole-Body Control

Language: Python - Size: 44.5 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

turtacn/dataseap

DataSeap:An open source unified data foundation for data intensive business powered by generative AI

Size: 4.88 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

apache/amoro

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.

Language: Java - Size: 67.8 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 960 - Forks: 327

buoyant-data/oxbow

Collection of AWS Lambdas for creating and managing Delta tables

Language: Rust - Size: 288 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 34 - Forks: 10

GEdnieLockett/DataBricks

Exploration of DataBrick SQL Servers and AI generated dashboarding

Size: 138 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

apache/gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.

Language: Java - Size: 47.9 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,470 - Forks: 450

zinggAI/zingg

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

Language: Java - Size: 679 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,023 - Forks: 125

dilermando-lima/trino-pg-mysql-s3-parquet

trino cluster collecting data from mysql and postgress process them and save into s3 as parquet

Language: Python - Size: 70.3 KB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

treeverse/lakeFS

lakeFS - Data version control for your data lake | Git for data

Language: Go - Size: 149 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 4,651 - Forks: 373

prestodb/prestorials

Tutorials and examples of how to deploy Presto and connect it to different data sources

Size: 1.11 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 20 - Forks: 15

activeloopai/deeplake

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

Language: Python - Size: 65.3 MB - Last synced at: 5 days ago - Pushed at: 18 days ago - Stars: 8,589 - Forks: 658

PaloAltoNetworks/pan-cortex-data-lake-python 📦

Python idiomatic SDK for Cortex™ Data Lake.

Language: Python - Size: 1.28 MB - Last synced at: 5 days ago - Pushed at: about 2 months ago - Stars: 45 - Forks: 21

jorgevillegas18/etl-postgres-to-starrocks-via-risingwave

This repository provides a modular and easy-to-extend ETL pipeline that streams data from a PostgreSQL database into a StarRocks data warehouse using RisingWave as the real-time streaming computation layer.

Size: 13.7 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

apache/hudi

Upserts, Deletes And Incremental Processing on Big Data.

Language: Java - Size: 1.74 GB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 5,756 - Forks: 2,396

ExpediaGroup/apiary-data-lake

Terraform scripts for deploying Apiary Data Lake

Language: HCL - Size: 741 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 19 - Forks: 30

sinaptik-ai/pandas-ai

Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.

Language: Python - Size: 54.4 MB - Last synced at: 6 days ago - Pushed at: 27 days ago - Stars: 19,924 - Forks: 1,884

trinodb/trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Language: Java - Size: 259 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 11,243 - Forks: 3,191

StarRocks/starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

Language: Java - Size: 473 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 9,951 - Forks: 1,982

samber/awesome-olap

A curated list of awesome Online Analytical Processing databases, frameworks, ressources and other awesomeness.

Size: 49.8 KB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 69 - Forks: 6

nimtable/nimtable

The Control Plane for Apache Iceberg

Language: TypeScript - Size: 3.87 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 39 - Forks: 4

hyparam/icebird

Icebird: JavaScript Iceberg Client

Language: JavaScript - Size: 224 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 23 - Forks: 0

manuzhang/awesome-lakehouse

a curated list of awesome lakehouse frameworks, applications, etc

Size: 41 KB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 27 - Forks: 4

MaciekLesiczka/bazof

Lakehouse with time travel

Language: Rust - Size: 47.2 MB - Last synced at: about 18 hours ago - Pushed at: about 18 hours ago - Stars: 0 - Forks: 0

prefeitura-rio/queries-rj-sms

Projeto dbt do Data Lake da Secretaria Municipal de Saúde

Language: PowerShell - Size: 6.17 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 4 - Forks: 0

JinsYin/awesome-datalake

📚 Awesome DataLake | 数据湖大全

Size: 11.7 KB - Last synced at: 6 days ago - Pushed at: 7 months ago - Stars: 3 - Forks: 0

rogui-manal/SQL-DATA-WAREHOUSE-PROJECT-FROM-SCRATCH

Building a modern Data Warehouse with SQL Server, including ETL processes, Data Modeling and analytics

Language: TSQL - Size: 981 KB - Last synced at: 9 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

Datavault-UK/automate-dv

A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)

Size: 8.32 MB - Last synced at: 9 days ago - Pushed at: about 2 months ago - Stars: 538 - Forks: 136

linkedin/openhouse

Open Control Plane for Tables in Data Lakehouse

Language: Java - Size: 6.35 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 343 - Forks: 55

vre-hub/vre

VRE infrastructure running at CERN

Language: Shell - Size: 13.4 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 8 - Forks: 2

WeBankFinTech/Streamis

Streaming application development and management system, based on Linkis and DSS, planning to provide the workflow-like graphical drag-and-drop development capability.

Language: Java - Size: 72.2 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 107 - Forks: 44

prefeitura-rio/pipelines_rj_sms

Pipelines de dados da Secretaria Municipal de Saúde

Language: Python - Size: 3.46 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 4 - Forks: 0

DataLinkDC/dinky

Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.

Language: Java - Size: 36.4 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 3,395 - Forks: 1,230

DataWithBaraa/sql-data-warehouse-project

A comprehensive guide to building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.

Language: TSQL - Size: 20.5 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 124 - Forks: 111

aws-solutions-library-samples/aws-insurancelake-etl

This solution helps you deploy ETL processes and data storage resources to create an Insurance Lake using Amazon S3 buckets for storage, AWS Glue for data transformation, and AWS CDK Pipelines. It is originally based on the AWS blog Deploy data lake ETL jobs using CDK Pipelines, and complements the InsuranceLake Infrastructure project

Language: Python - Size: 8.94 MB - Last synced at: 17 days ago - Pushed at: 6 months ago - Stars: 26 - Forks: 12

apache/doris-thirdparty

Self-managed thirdparty dependencies for Apache Doris

Size: 515 MB - Last synced at: about 12 hours ago - Pushed at: 2 days ago - Stars: 37 - Forks: 43

leesf/hudi-resources

汇总Apache Hudi相关资料

Size: 23.7 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 550 - Forks: 160

Noobzik/ATL-Datamart

TP d'architecture décisionnel à destination des étudiants de l'EPSI et DC Paris. Le but est de déployer une architecture data dès la récupération de la donnée vers la restitution sous la forme de dataviz en passant par un Datalake, Data Warehouse et d'un Data Mart

Language: Python - Size: 465 KB - Last synced at: 24 days ago - Pushed at: 25 days ago - Stars: 4 - Forks: 103

awslabs/aws-orbit-workbench 📦

A Data Platform built for AWS, powered by Kubernetes.

Language: Python - Size: 53.7 MB - Last synced at: 1 day ago - Pushed at: almost 2 years ago - Stars: 148 - Forks: 92

anquev/minilake

A lightweight Python data lake solution with Delta Lake and S3 support. Simple storage, ingestion, and DuckDB-powered querying for data workflows.

Language: Python - Size: 7.39 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 1 - Forks: 0

deddyandri/tokyo-olympic-azure-data-analyst-project

tokyo-olympic-azure-data-analyst and engineering-project

Language: Jupyter Notebook - Size: 627 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Saikesana31/Netflix

Azure Data engineering project

Language: Python - Size: 1.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

izhangzhihao/Real-time-Data-Warehouse

Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi

Language: Dockerfile - Size: 106 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 113 - Forks: 44

lynnlangit/serverless-architecture

Companion to my Linked In Learning 'Serverless Architecture' course

Size: 5.77 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 19 - Forks: 8

paradedb/pg_analytics 📦

DuckDB-powered data lake analytics from Postgres

Language: Rust - Size: 814 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 522 - Forks: 21

jblukach/parquet2csv

Convert from CSV to Parquet and back again!

Language: Rust - Size: 6.84 KB - Last synced at: 9 days ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

LearningJournal/SparkProgrammingInScala

Apache Spark Course Material

Language: Scala - Size: 50.9 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 88 - Forks: 159

leo-project/leofs

The LeoFS Storage System

Language: Erlang - Size: 30 MB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 1,563 - Forks: 155

ismailsimsek/iceberg-examples

Apache iceberg Spark s3 examples

Language: Java - Size: 33.2 KB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 20 - Forks: 9

lucashomuniz/Project-05

DATA ENGINEERING FOR OLYMPICS USING AZURE, SQL AND PBI

Language: Jupyter Notebook - Size: 1.38 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

transferia/iceberg

Transferia iceberg provider

Language: Go - Size: 134 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Databricks-BR/open_tax

Lakehouse Tributário, para apoio gerencial aos processos fiscais, visando a melhoria contínua, identificação de falhas (Tax Compliance), modelos inteligentes de identificação de oportunidades (Tax Intelligence) e democratização das informações fiscais.

Language: Python - Size: 4.55 MB - Last synced at: 21 days ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 0

Saikesana31/Adventure_Works_DE

Azure Data engineering project

Size: 2.26 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

imsanjoykb/ETL-Project

The goal of this project is to illustrate Extract Transform Load (ETL) using Python and SQL. ETL is a process commonly done in computing, which takes raw data, cleans it and stores it for later use. The extraction phase targets and retrieves the data. Transform manipulates and cleans the data. Then load stores the data, typically in a data warehouse.

Language: Jupyter Notebook - Size: 285 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 22 - Forks: 9

hoaihuongbk/lakeops

A modern data lake operations toolkit working with multiple table formats (Delta, Iceberg, Parquet) and engines (Spark, Polars) via the same APIs.

Language: Python - Size: 683 KB - Last synced at: 25 days ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

KennethanCeyer/awesome-data-pipeline

Awesome list for datapipeline

Size: 200 KB - Last synced at: 4 days ago - Pushed at: over 2 years ago - Stars: 34 - Forks: 4

pracdata/awesome-open-source-data-engineering

A curated list of open source tools used in analytics platforms and data engineering ecosystem

Size: 219 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 274 - Forks: 29

SiyaMathe/Modern-Data-Architecture-Concepts

This project aims to provide a comprehensive overview of modern data architecture concepts, including data lakes, data meshes, cloud-based solutions, and real-time processing, and their application in addressing contemporary data challenges.

Size: 8.79 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

soorajpazeekal/logistics-real-time-poc

A Data engineering based Proof of Concept demonstrating cutting-edge logistics solutions for a US-based Grocery Delivery Platform

Language: Jupyter Notebook - Size: 30.3 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 4

nxion/sql-data-warehouse-project

Building a modern data warehouse with MS SQL server, ETL processes, data modeling and analyitics.

Size: 806 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

openEDI/open-data-access-tools

OEDI Data Lake Access

Language: Python - Size: 43.7 MB - Last synced at: 7 days ago - Pushed at: 2 months ago - Stars: 12 - Forks: 10

Mariann95/SQL_Data_Warehouse_And_Analytics_Project

Building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics. This repository also contains a collection of SQL scripts demonstrating various analytical techniques, such as changes over time, cumulative, performance, data segmentation, part-to-whole analysis.

Language: TSQL - Size: 2.45 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

mnpw/mdex

Icberg metadata explorer

Language: Rust - Size: 26.4 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

LearningJournal/Spark-Streaming-In-Scala

Apache Spark 3 - Structured Streaming Course Material

Language: Scala - Size: 19.4 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 45 - Forks: 77

aws-solutions-library-samples/aws-insurancelake-infrastructure

This solution helps you deploy ETL processes and data storage resources to create an Insurance Lake using Amazon S3 buckets for storage, AWS Glue for data transformation, and AWS CDK Pipelines. It is originally based on the AWS blog Deploy data lake ETL jobs using CDK Pipelines, and complements the InsuranceLake ETL with CDK Pipelines project.

Language: Python - Size: 471 KB - Last synced at: 20 days ago - Pushed at: 7 months ago - Stars: 14 - Forks: 7

edgBR/delta-lake-polars

Building a poor man's data lake: Exploring the Power of Polars and Delta Lake

Language: Python - Size: 375 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 7 - Forks: 0

neuro-ml/tarn

An insanely customizable framework for key-value storage 💾

Language: Python - Size: 344 KB - Last synced at: 17 days ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

Carolinerocks/azure-data-engineering-end-to-end-project

Language: Jupyter Notebook - Size: 3.36 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

cuebook/cuelake

Use SQL to build ELT pipelines on a data lakehouse.

Language: JavaScript - Size: 28 MB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 285 - Forks: 28

JohnMata0427/Data-Lake-Case-Studies

Casos de Estudio con Data Lake

Language: Jupyter Notebook - Size: 51 MB - Last synced at: 2 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

DaviMacielCavalcante/desafio2-prof-artemisia

🚀 ETL Challenge: A hands-on project to explore ETL concepts and Data Lake creation in the cloud! Ideal for those who want to understand how to extract, transform, and load data in a scalable environment and integrate it with BI tools for visualization and analysis!

Language: Python - Size: 6.11 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

logleads/LogverzReleases

LOGVERZ APPLICATION BUNDLE: ✔️ Get insights 10x faster ⚡. ✔️ Cut costs by 90% 💰: Slash your data processing and storage expenses. ✔️ Keep your data secure in AWS 🔐—no external transfers. ✔️ Have an all-in-one solution💡: Collect, process, and analyze data without juggling multiple tools. ✔️ Work seamlessly with Power BI, Tableau, and more 📈.

Language: PowerShell - Size: 97.7 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 3 - Forks: 1

jayhan94/MiniLake

A morden mini lakehouse based on Spark and Iceberg running in the docker.

Size: 8.79 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

stonezhong/DataManager

Better organize data in data lake and build ETL pipeline with Web UI tool.

Language: JavaScript - Size: 2.33 MB - Last synced at: 10 days ago - Pushed at: about 4 years ago - Stars: 9 - Forks: 2

japila-books/delta-lake-internals

The Internals of Delta Lake

Size: 191 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 183 - Forks: 36

slowLatency/DE-Apple-Data-Analysis

A Data Pipeline solution using Databricks and Apache Spark to process and analyze Apple data.

Language: Python - Size: 15.6 KB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Rucal-Data-Solutions/datalakefoundation

Datalakehouse Foundation

Language: Scala - Size: 187 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 3 - Forks: 1

chandima2000/Adventure-Works-sales-data-engineering-project

The aim of this project is to build an end-to-end data engineering project using Microsoft Azure

Language: Jupyter Notebook - Size: 6.78 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

tuancamtbtx/dataplatform-stack

How to build a complete Data Platform -> Here

Language: Python - Size: 7.57 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 5 - Forks: 0

lynnlangit/learning-nosql

Companion repository to Linked In Learning course 'Cloud NoSQL for SQL Pros'

Size: 1.01 MB - Last synced at: 6 days ago - Pushed at: 5 months ago - Stars: 4 - Forks: 3

cuiyuheng/hudi Fork of apache/hudi

Upserts, Deletes And Incremental Processing on Big Data.

Size: 1.39 GB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

ewerthonk/datalakehouse-northwind

Creating a Simple Data Lakehouse using Delta Lake on Databricks. My 1st Data Engineering Project.

Language: Jupyter Notebook - Size: 559 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 1

dougdss89/wideworldadventure

This repository includes all files that compose the design and unification of the databases AdventureWorks and WideWorldAdventure project.

Language: Shell - Size: 230 KB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

hussein-awala/gdpr-compliant-lakehouse

This repository is a demonstration of how to handle GDPR export and delete requests in an Iceberg Lakehouse to make it GDPR-compliant.

Language: Jupyter Notebook - Size: 9.77 KB - Last synced at: 7 days ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

fortinux/bigdata-book

Libro Fundamentos de Big Data

Language: Jupyter Notebook - Size: 7.93 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 2 - Forks: 1

KleinYuan/llama2-csv-webapp

self host/local host llama2 based web app to chat with your csvs (multiple)

Language: Python - Size: 168 KB - Last synced at: about 4 hours ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 0

NiranjanRao07/data-226-assignments

This repository includes assignments for DATA 226, focused on designing databases, implementing SQL for analytics, performing ETL operations, building data pipelines, and conducting OLAP.

Language: Jupyter Notebook - Size: 7.6 MB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

divithraju/divith-raju-Immigration-Data-Engineering

A Capstone Project that covers several aspects of Data Engineering (Data Exploration, Cleaning, Modeling, Pipelining, Processing)

Language: Jupyter Notebook - Size: 2.5 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

dd-Splunk/splunk-datalake

How to combine smart store and ingest action for datalake use case

Language: Python - Size: 360 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

richclement/aws-data-lake-sdk 📦

An sdk for the AWS data lake.

Language: JavaScript - Size: 43 KB - Last synced at: about 6 hours ago - Pushed at: almost 8 years ago - Stars: 1 - Forks: 1

AbsaOSS/enceladus

Dynamic Conformance Engine

Language: Scala - Size: 7.94 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 31 - Forks: 14

legout/pydala 📦

Poor mans simple python api for creating a local or remote datalake based on several (pyarrow) datasets using duckdb

Language: Python - Size: 14.1 MB - Last synced at: 1 day ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 1

leonardodrigo/breweries-data-lake

This project builds an Azure Data Lake using the Medallion architecture to process data with Spark from the Open Breweries DB API.

Language: Python - Size: 732 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

laismeuchi/dados-databricks-base-cnpj

Projeto utilizando a base de CNPJ da Receita Federal

Language: Python - Size: 84 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

parthnchoudhury/Enterprise_Data_Architecture

The pragmatic technology journey for an Enterprise Data Model serving reporting, analytical, advanced data science and other digital use cases with integrated data from a variety of sources.

Size: 666 KB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

dbsystel/datalake-graphql-wrapper

The DataLake GraphQL Wrapper provides a GraphQL API for presto/trino.

Language: TypeScript - Size: 294 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 18 - Forks: 0

nataliabeltranarg/NoSQL-DataArchitecture-Spark

Implementing core components of a data-driven architecture using Spark: Data Management and Data Analysis Backbones with structured zones in a data lake and analytical capabilities

Language: Jupyter Notebook - Size: 1.4 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0