GitHub topics: data-orchestration
ryandmonk/knowledge_graph_brain
MCP-native knowledge graph orchestrator that unifies data silos with GraphRAG, dynamic connectors, and local AI.
Language: TypeScript - Size: 1010 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 1

apache/incubator-graphar
An open source, standard data file format for graph data storage and retrieval.
Language: C++ - Size: 17.1 MB - Last synced at: 3 days ago - Pushed at: 9 days ago - Stars: 295 - Forks: 73

astronomer/airflow-provider-fivetran-async
A new Airflow Provider for Fivetran, maintained by Astronomer and Fivetran
Language: Python - Size: 230 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 22 - Forks: 10

kestra-io/kestra
:zap: Universal Workflow Orchestration Platform — Code in any language, run anywhere. 800+ plugins for data, infrastructure, and AI automation.
Language: Java - Size: 69 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 20,466 - Forks: 1,726

cubefs/cubefs
cloud-native distributed storage
Language: Go - Size: 158 MB - Last synced at: 8 days ago - Pushed at: 16 days ago - Stars: 5,232 - Forks: 672

Alluxio/alluxio
Alluxio, data orchestration for analytics and machine learning in the cloud
Language: Java - Size: 196 MB - Last synced at: 13 days ago - Pushed at: 4 months ago - Stars: 7,060 - Forks: 2,947

Luk-kar/SmogSense
A modular data platform for end-to-end analytics, data pipeline orchestration, machine learning model registry in local open-source environments or for a commercial cloud setup.
Language: Jupyter Notebook - Size: 8.32 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

bravorod/mimic-ehr-pipeline
EHR pipeline that simulates MIMIC-IV patient data streams, performs advanced feature engineering and clinical severity scoring using machine learning (Random Forest Classifier), and prepares structured outputs for scalable downstream analytics
Language: Python - Size: 634 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 1

nishthapant/airflow
This project orchestrates an end-to-end data pipeline for an e-commerce dataset using Apache Airflow (in Docker) and a separate dbt (data build tool) project. The pipeline transforms raw source data into structured, analytics-ready datasets.
Language: Python - Size: 1.13 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Rudsss18/airflow-dbt-weather-pipeline
Build a weather data pipeline with Python, Apache Airflow, dbt, PostgreSQL, and Superset for efficient ETL and interactive visualizations. 🌧️🚀
Language: Python - Size: 17.6 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

iam-mhaseeb/Skytrax-Data-Warehouse 📦
A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data visualizations such as analytical dashboards.
Language: Python - Size: 1.34 MB - Last synced at: 26 days ago - Pushed at: over 5 years ago - Stars: 137 - Forks: 30

longNguyen010203/Finance-Data-Ingestion-Pipeline-with-Kafka
Develop a real-time data ingestion pipeline using Kafka and Spark. Collect minute-level stock data from Yahoo Finance, ingest it into Kafka, and process it with Spark Streaming, storing the results in Cassandra. Orchestrated the workflow using Airflow deployed on Docker.
Language: Python - Size: 250 KB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 4 - Forks: 1

Alluxio/k8s-operator
An operator for managing Alluxio system on Kubernetes cluster
Language: Go - Size: 270 KB - Last synced at: 23 days ago - Pushed at: over 1 year ago - Stars: 13 - Forks: 9

ddeutils/data-orchestra
❌ Full-Stack Data Orchestration config by Yaml template with Flask & HTMX
Language: Python - Size: 3.81 MB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

SAP-samples/btp-data-to-value-workshop
This repo contains a dataset, exercises, and sample code for an end-to-end SAP BTP data-to-value bootcamp covering SAP HANA Cloud, SAP Data Warehouse Cloud, SAP Data Intelligence Cloud, and SAP Analytics Cloud.
Language: Jupyter Notebook - Size: 167 MB - Last synced at: 5 months ago - Pushed at: 6 months ago - Stars: 24 - Forks: 24

dagster-io/dagster-quickstart 📦
Get started with Dagster ASAP
Language: Python - Size: 20.5 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 11 - Forks: 122

taquynhnga2001/proptech-dagster
Build an ELT pipeline with dagster and dbt to schedule loading HDB resale transactions in Singapore into Google BigQuery data warehouse, then create Power BI dashboard to enhance insight exploration.
Language: Python - Size: 3.42 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

kestra-io/examples
Best practices for data workflows, integrations with the Modern Data Stack (MDS), Infrastructure as Code (IaC), Cloud Provider Services
Language: HCL - Size: 3.28 MB - Last synced at: 5 months ago - Pushed at: 6 months ago - Stars: 25 - Forks: 9

jonathanneo/data-aware-orchestration
Data-aware orchestration with dagster, dbt, and airbyte
Language: Python - Size: 1.36 MB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 31 - Forks: 0

GADES-DATAENG/webinar
Code, scripts, and resources for the Data Engineering Fundamentals Course Webinar, covering Python, data pipelines, Apache Airflow, and more.
Language: Python - Size: 26.9 MB - Last synced at: 5 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

anna-geller/kestra-ci-cd
CI/CD repository template to automate deployments of your production flows
Language: HCL - Size: 104 KB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 12 - Forks: 5

kestra-io/data-engineering-zoomcamp
Code for the Data Engineering Zoomcamp course
Size: 470 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 3 - Forks: 1

kingabzpro/5-Airflow-Alternatives-for-Data-Orchestration-Tutorial
Code examples of Luigi, Prefect, Kedro, Dagster, and MageAI
Language: Python - Size: 40 KB - Last synced at: 6 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

ozkary/data-engineering-mta-turnstile
Data Engineering - Metropolitan Transportation Authority (MTA) Subway Data Analysis
Language: Jupyter Notebook - Size: 12.9 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 25 - Forks: 4

jasontanx/prefect-learning
Prefect - Data orchestration tool practice & learning
Language: Python - Size: 314 KB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Wireforce-LLC/m3
☕ Data Orchestrator. Without abstractions
Language: TypeScript - Size: 139 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

philiporlando/dagster_university
I created this repo to follow along with the examples in the Dagster University Essentials course.
Language: Python - Size: 90.8 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

jacquessham/airflow_notes
Repository to store scripts and notes on Airflow
Language: Python - Size: 186 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Annielytix/azure-data-factory-data-vault
Working with SCD Type (Change Data Capture) and need a Data Vault model to test Azure Data Factory v2? - This Code with Help!
Size: 2.75 MB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 0

anna-geller/kestra-terraform-examples
Bring Infrastructure as Code best practices to your data workflows with Kestra and Terraform
Language: HCL - Size: 746 KB - Last synced at: 23 days ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 0

MostafaNabilll/end2end_pipeline
End to End data engineering project
Language: Python - Size: 3.87 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

zpencerguy/superdoppler
Data orchestration repo with Docker deployment
Language: Python - Size: 38.1 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

stemitom/postgres-pipeline
A simple pipeline infrastructure with ETL pipeline contained in a Docker environment on Apache Airflow for orchestration and Postgres for data warehousing
Language: Python - Size: 217 KB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 3
