An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: data-orchestration

ryandmonk/knowledge_graph_brain

MCP-native knowledge graph orchestrator that unifies data silos with GraphRAG, dynamic connectors, and local AI.

Language: TypeScript - Size: 1010 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 1

apache/incubator-graphar

An open source, standard data file format for graph data storage and retrieval.

Language: C++ - Size: 17.1 MB - Last synced at: 3 days ago - Pushed at: 9 days ago - Stars: 295 - Forks: 73

astronomer/airflow-provider-fivetran-async

A new Airflow Provider for Fivetran, maintained by Astronomer and Fivetran

Language: Python - Size: 230 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 22 - Forks: 10

kestra-io/kestra

:zap: Universal Workflow Orchestration Platform — Code in any language, run anywhere. 800+ plugins for data, infrastructure, and AI automation.

Language: Java - Size: 69 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 20,466 - Forks: 1,726

cubefs/cubefs

cloud-native distributed storage

Language: Go - Size: 158 MB - Last synced at: 8 days ago - Pushed at: 16 days ago - Stars: 5,232 - Forks: 672

Alluxio/alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud

Language: Java - Size: 196 MB - Last synced at: 13 days ago - Pushed at: 4 months ago - Stars: 7,060 - Forks: 2,947

Luk-kar/SmogSense

A modular data platform for end-to-end analytics, data pipeline orchestration, machine learning model registry in local open-source environments or for a commercial cloud setup.

Language: Jupyter Notebook - Size: 8.32 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

bravorod/mimic-ehr-pipeline

EHR pipeline that simulates MIMIC-IV patient data streams, performs advanced feature engineering and clinical severity scoring using machine learning (Random Forest Classifier), and prepares structured outputs for scalable downstream analytics

Language: Python - Size: 634 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 1

nishthapant/airflow

This project orchestrates an end-to-end data pipeline for an e-commerce dataset using Apache Airflow (in Docker) and a separate dbt (data build tool) project. The pipeline transforms raw source data into structured, analytics-ready datasets.

Language: Python - Size: 1.13 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Rudsss18/airflow-dbt-weather-pipeline

Build a weather data pipeline with Python, Apache Airflow, dbt, PostgreSQL, and Superset for efficient ETL and interactive visualizations. 🌧️🚀

Language: Python - Size: 17.6 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

iam-mhaseeb/Skytrax-Data-Warehouse 📦

A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data visualizations such as analytical dashboards.

Language: Python - Size: 1.34 MB - Last synced at: 26 days ago - Pushed at: over 5 years ago - Stars: 137 - Forks: 30

longNguyen010203/Finance-Data-Ingestion-Pipeline-with-Kafka

Develop a real-time data ingestion pipeline using Kafka and Spark. Collect minute-level stock data from Yahoo Finance, ingest it into Kafka, and process it with Spark Streaming, storing the results in Cassandra. Orchestrated the workflow using Airflow deployed on Docker.

Language: Python - Size: 250 KB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 4 - Forks: 1

Alluxio/k8s-operator

An operator for managing Alluxio system on Kubernetes cluster

Language: Go - Size: 270 KB - Last synced at: 23 days ago - Pushed at: over 1 year ago - Stars: 13 - Forks: 9

ddeutils/data-orchestra

❌ Full-Stack Data Orchestration config by Yaml template with Flask & HTMX

Language: Python - Size: 3.81 MB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

SAP-samples/btp-data-to-value-workshop

This repo contains a dataset, exercises, and sample code for an end-to-end SAP BTP data-to-value bootcamp covering SAP HANA Cloud, SAP Data Warehouse Cloud, SAP Data Intelligence Cloud, and SAP Analytics Cloud.

Language: Jupyter Notebook - Size: 167 MB - Last synced at: 5 months ago - Pushed at: 6 months ago - Stars: 24 - Forks: 24

dagster-io/dagster-quickstart 📦

Get started with Dagster ASAP

Language: Python - Size: 20.5 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 11 - Forks: 122

taquynhnga2001/proptech-dagster

Build an ELT pipeline with dagster and dbt to schedule loading HDB resale transactions in Singapore into Google BigQuery data warehouse, then create Power BI dashboard to enhance insight exploration.

Language: Python - Size: 3.42 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

kestra-io/examples

Best practices for data workflows, integrations with the Modern Data Stack (MDS), Infrastructure as Code (IaC), Cloud Provider Services

Language: HCL - Size: 3.28 MB - Last synced at: 5 months ago - Pushed at: 6 months ago - Stars: 25 - Forks: 9

jonathanneo/data-aware-orchestration

Data-aware orchestration with dagster, dbt, and airbyte

Language: Python - Size: 1.36 MB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 31 - Forks: 0

GADES-DATAENG/webinar

Code, scripts, and resources for the Data Engineering Fundamentals Course Webinar, covering Python, data pipelines, Apache Airflow, and more.

Language: Python - Size: 26.9 MB - Last synced at: 5 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

anna-geller/kestra-ci-cd

CI/CD repository template to automate deployments of your production flows

Language: HCL - Size: 104 KB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 12 - Forks: 5

kestra-io/data-engineering-zoomcamp

Code for the Data Engineering Zoomcamp course

Size: 470 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 3 - Forks: 1

kingabzpro/5-Airflow-Alternatives-for-Data-Orchestration-Tutorial

Code examples of Luigi, Prefect, Kedro, Dagster, and MageAI

Language: Python - Size: 40 KB - Last synced at: 6 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

ozkary/data-engineering-mta-turnstile

Data Engineering - Metropolitan Transportation Authority (MTA) Subway Data Analysis

Language: Jupyter Notebook - Size: 12.9 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 25 - Forks: 4

jasontanx/prefect-learning

Prefect - Data orchestration tool practice & learning

Language: Python - Size: 314 KB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Wireforce-LLC/m3

☕ Data Orchestrator. Without abstractions

Language: TypeScript - Size: 139 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

philiporlando/dagster_university

I created this repo to follow along with the examples in the Dagster University Essentials course.

Language: Python - Size: 90.8 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

jacquessham/airflow_notes

Repository to store scripts and notes on Airflow

Language: Python - Size: 186 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Annielytix/azure-data-factory-data-vault

Working with SCD Type (Change Data Capture) and need a Data Vault model to test Azure Data Factory v2? - This Code with Help!

Size: 2.75 MB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 0

anna-geller/kestra-terraform-examples

Bring Infrastructure as Code best practices to your data workflows with Kestra and Terraform

Language: HCL - Size: 746 KB - Last synced at: 23 days ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 0

MostafaNabilll/end2end_pipeline

End to End data engineering project

Language: Python - Size: 3.87 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

zpencerguy/superdoppler

Data orchestration repo with Docker deployment

Language: Python - Size: 38.1 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

stemitom/postgres-pipeline

A simple pipeline infrastructure with ETL pipeline contained in a Docker environment on Apache Airflow for orchestration and Postgres for data warehousing

Language: Python - Size: 217 KB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 3