An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: apache-airflow

airflow-laminar/airflow-ha

High Availability (HA) DAG Utility

Language: Python - Size: 2.25 MB - Last synced at: about 16 hours ago - Pushed at: about 18 hours ago - Stars: 9 - Forks: 1

imchandanmohan/airflow-etl-ml-monitoring-pipeline

End-to-end MLOps pipeline with Airflow ETL orchestration, Redis feature store, and real-time ML monitoring using Prometheus & Grafana with automated data drift detection

Language: Jupyter Notebook - Size: 23.4 KB - Last synced at: about 19 hours ago - Pushed at: about 21 hours ago - Stars: 0 - Forks: 0

ragztigadi/RealTime-Reddit-BigData-ETL-Pipeline-on-AWS-Glue-Redshift

A cloud-native data engineering pipeline that ingests live Reddit data, orchestrates ETL with Apache Airflow, transforms with AWS Glue, stores in Amazon S3, and queries with Redshift & Athena. Includes schema automation with Glue Crawler and dashboard-ready datasets for BI tools.

Language: Python - Size: 18.9 MB - Last synced at: about 19 hours ago - Pushed at: about 21 hours ago - Stars: 0 - Forks: 0

astronomer/astronomer

Helm Charts for the Astronomer Platform, Apache Airflow as a Service on Kubernetes

Language: Python - Size: 12.3 MB - Last synced at: about 23 hours ago - Pushed at: 1 day ago - Stars: 485 - Forks: 94

astronomer/dag-factory

Construct Apache Airflow DAGs Declaratively via YAML configuration files

Language: Python - Size: 11.4 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1,362 - Forks: 213

MohamedSamiHdj/realtime-data-pipeline

📊 Build a reliable real-time data pipeline for Windows using PySpark, ensuring quality data flow from raw ingestion to curated datasets.

Language: Python - Size: 1.3 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

cassimiro3/ottawa-data-pipeline-airflow

🚀 Build an end-to-end data pipeline using Ottawa Building Permits, leveraging Apache Airflow and Docker for reliable data processing and analytics.

Language: Python - Size: 3.25 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

nguyennam05/AI.TUTOR

AI Tutor is a chatbot-based web app that answers syllabus-specific queries using Google Gemini API. It integrates Google Drive for eBook storage, MongoDB for chat history, and Clerk for user authentication, ensuring accurate, secure, and curriculum-aligned responses to students.

Language: JavaScript - Size: 7.06 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 2 - Forks: 1

Subrat1920/Titanic-Survival-MLOps

This is a practice set of project for getting started with Model monitoring Metrics...

Language: Python - Size: 630 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

elyra-ai/elyra

Elyra extends JupyterLab with an AI centric approach.

Language: Python - Size: 115 MB - Last synced at: 1 day ago - Pushed at: 16 days ago - Stars: 1,967 - Forks: 360

apache/airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Language: Python - Size: 468 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 42,968 - Forks: 15,853

airflow-laminar/airflow-pydantic

Pydantic models for Apache Airflow

Language: Python - Size: 3.36 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 8 - Forks: 1

aymane-maghouti/Big-Data-Project

This project aims to predict smartphone prices using a combination of batch and stream processing techniques in a Big Data environment. The architecture follows the Lambda Architecture pattern, providing both real-time and batch processing capabilities to users.

Language: Python - Size: 960 KB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 18 - Forks: 2

astronomer/airflow-provider-fivetran-async

A new Airflow Provider for Fivetran, maintained by Astronomer and Fivetran

Language: Python - Size: 235 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 22 - Forks: 10

euiyounghwang/Prometheus-monitoring-exporter

Prometheus-monitoring-exporter

Language: Python - Size: 258 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

airflow-laminar/airflow-priority

Priority Tags for Airflow Dags

Language: Python - Size: 2.26 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 7 - Forks: 1

astronomer/astro-cli

CLI that makes it easy to create, test and deploy Airflow DAGs to Astronomer

Language: Go - Size: 17 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 420 - Forks: 96

witchyburn/vk_friends_analyzer

ETL-пайплайн для мониторинга друзей ВКонтакте с системой оповещений и аналитикой. Проект автоматически отслеживает изменения в списке друзей и предоставляет удобный дашборд для анализа социальных связей.

Language: Python - Size: 196 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

astronomer/airflow-chart

A Helm chart to install Apache Airflow on Kubernetes

Language: Python - Size: 4.9 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 290 - Forks: 94

dmp-labs/dmp-af

Distributed run of dbt models using Airflow

Language: Python - Size: 6.88 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 2

JoseVF5/Data-Mart---TechStyle-Commerce

Este repositório contém o desenvolvimento de um pipeline de dados completo e automatizado, simulando um ambiente corporativo para a empresa fictícia "TechStyle Commerce". O projeto foi criado como um case prático para demonstrar habilidades em engenharia de dados, desde a ingestão de fontes brutas até a disponibilização de dashboards.

Language: Python - Size: 20.7 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 3 - Forks: 1

Dhanush-Raj1/Ecommerce-Chatbot-Project

A GenAI-powered chatbot for an ecommerce clothing store that answers user queries, provides recommendations, tracks orders and more.

Language: Python - Size: 27 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 3 - Forks: 0

andreax79/airflow-code-editor

A plugin for Apache Airflow that allows you to edit DAGs in browser

Language: Vue - Size: 15.5 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 450 - Forks: 55

Dpbm/qcop

An AI model to predict the output of a quantum cirucit

Language: Python - Size: 830 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

gestaogovbr/Ro-dou

Gerador de DAGs no Apache Airflow para fazer clipping do Diário Oficial da União.

Language: Python - Size: 4.13 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 162 - Forks: 56

apache/airflow-client-python

Apache Airflow - OpenApi Client for Python

Language: Python - Size: 1.86 MB - Last synced at: 3 days ago - Pushed at: 8 days ago - Stars: 434 - Forks: 62

astronomer/astronomer-cosmos

Run your dbt Core or dbt Fusion projects as Apache Airflow DAGs and Task Groups with a few lines of code

Language: Python - Size: 19.3 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1,065 - Forks: 241

Daniel-jcVv/Daniel-jcVv

👨‍💻 Data Engineer | 3+ years enterprise experience with Telcel & Citi Banamex Develop ETL pipelines, data governance, and cloud solutions. Building scalable data architectures and automated workflows for Fortune 500 clients. Tech Stack: Python, SQL Server, Oracle, Apache Airflow, PySpark

Size: 35.2 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

subhamay-bhattacharyya/astronomer-airflow-template

📄🎯 GitHub Repository Template for Apache Airflow to be hosted and executed in Astronomer Cloud

Size: 62.5 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

fuadonates/gcp-data-lake-platform

GCP data lake platform integrating 4 source systems with BigQuery, Airflow, and Dataflow - Bronze-Silver architecture

Language: Python - Size: 0 Bytes - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

fuadonates/data-engineering

A collection of data engineering projects, proofs-of-concept (POCs), and proofs-of-knowledge (POKs) using technologies like Python, Spark, SQL, and cloud platforms.

Size: 5.86 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

GregoryKogan/crypto-trading-data-pipeline

Real-time crypto trading data pipeline using Apache Spark, Kafka, and Airflow. Containerized microservices architecture for streaming analytics.

Language: Python - Size: 21.5 KB - Last synced at: 3 days ago - Pushed at: 11 days ago - Stars: 2 - Forks: 0

airflow-laminar/airflow-config

A Configuration System for Airflow

Language: Python - Size: 2.19 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 13 - Forks: 3

HugoBSantos/airflow-intro

Learning to orchestrate data pipelines using Apache Airflow.

Language: Python - Size: 22.5 KB - Last synced at: 6 days ago - Pushed at: 12 days ago - Stars: 2 - Forks: 0

airflow-laminar/airflow-balancer

Utilities for tracking hosts and ports and load balancing DAGs

Language: Python - Size: 1.68 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 4 - Forks: 0

jghoman/awesome-apache-airflow

Curated list of resources about Apache Airflow

Language: Shell - Size: 550 KB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 3,847 - Forks: 498

voidrot/dag-sync

Sync Airflow DAG's from S3 to local filesystem

Language: Go - Size: 7.81 KB - Last synced at: 12 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

WHEELYDOS/nasa_etl

ETL pipeline built with Apache Airflow, deployed on Astro, and integrated with AWS RDS (PostgreSQL) for scalable data orchestration and storage

Language: Python - Size: 901 KB - Last synced at: 13 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

eliakimpires/flight-analytics-pipeline

End-to-end ELT data pipeline for US flight performance analysis. Orchestrated with Apache Airflow, transformed with dbt, and containerized with Docker.

Language: Python - Size: 37.1 KB - Last synced at: about 20 hours ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

HasibulHasanKhan/Retail-Sales-Analytics-Pipeline

This Retail Sales Analytics Pipeline is a fully modular, end-to-end data analytics project designed for retail businesses to analyze sales performance, customer behavior, and marketing ROI, while generating actionable insights through dashboards and reports.

Language: Jupyter Notebook - Size: 4.88 KB - Last synced at: 14 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

astronomer/ebook-etl-elt

Companion repository to the ETL & ELT Pipelines with Apache Airflow® eBook

Language: Python - Size: 607 KB - Last synced at: 15 days ago - Pushed at: 16 days ago - Stars: 31 - Forks: 15

Adelllllllll/ottawa-data-pipeline-airflow

A complete data engineering project simulating a modern data lake architecture (Raw → Staging → Curated → Index) using Apache Airflow, LocalStack S3, MySQL, MongoDB, and Elasticsearch.

Language: Python - Size: 3.25 MB - Last synced at: 15 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

Hsooni491/cafe_sales_etl

☕ Production-grade ETL pipeline for café sales analytics using Apache Airflow, Python, and PostgreSQL. Automates data extraction, transformation, quality validation, and BI reporting with visual analytics.

Language: Python - Size: 174 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

abhishekbhakat/airflow-mcp-server

MCP Server for Apache Airflow

Language: Python - Size: 25.6 MB - Last synced at: 14 days ago - Pushed at: 17 days ago - Stars: 24 - Forks: 4

johnryanmcnally/tile_tracker_pipeline

A data pipeline for a RAG LLM powered by enriched Tile Tracker location data, built with Langchain, Apache Airflow and PostgreSQL.

Language: Jupyter Notebook - Size: 98.7 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 1 - Forks: 0

aws-ia/terraform-aws-mwaa

Terraform module for Amazon MWAA(Apache Airflow)

Language: HCL - Size: 3.23 MB - Last synced at: 18 days ago - Pushed at: 2 months ago - Stars: 54 - Forks: 64

DHANA5982/Big_Data_Engineering_Azure_GCP_AWS

Comprehensive Big Data Engineering learning repository featuring hands-on projects with Hadoop, Spark, Kafka, Docker, Airflow, and Azure Cloud. Includes end-to-end data pipelines, real-time streaming, and distributed processing implementations.

Language: Jupyter Notebook - Size: 44.4 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 1 - Forks: 0

airflow-laminar/pydantic-airflow

Pydantic models for Apache Airflow

Language: Python - Size: 45.9 KB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 2 - Forks: 0

airflow-laminar/airflow-common

Common Airflow Operators / Tasks

Language: Python - Size: 632 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 5 - Forks: 1

airflow-laminar/airflow-supervisor

Airflow utilities for running long-running or always-on jobs with supervisord

Language: Python - Size: 1.94 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 13 - Forks: 3

Toloka/dbt-af

Distributed run of dbt models using Airflow

Language: Python - Size: 3.51 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 166 - Forks: 13

call518/MCP-Airflow-API

🔍Model Context Protocol (MCP) server for Apache Airflow API integration. Provides comprehensive tools for managing Airflow clusters including service operations, configuration management, status monitoring, and request tracking.

Language: Python - Size: 1.18 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 41 - Forks: 9

WordPress/openverse-catalog 📦

Identifies and collects data on cc-licensed content across web crawl data and public apis.

Language: Python - Size: 92.6 MB - Last synced at: 5 days ago - Pushed at: about 2 years ago - Stars: 61 - Forks: 53

arturLMoretti/BEES-Data-Engineering---Breweries-Case

Language: Python - Size: 98.6 KB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

MahajanPreksha/Birthday-Reminder-Pipeline

A data pipeline built using Python and Apache Airflow that checks MySQL database every morning and sends personalized birthday reminders to a Discord channel, complete with the person's age and celebratory formatting.

Language: Python - Size: 17.6 KB - Last synced at: 22 days ago - Pushed at: 29 days ago - Stars: 1 - Forks: 0

blackbass64/data-orchestration-with-apache-airflow

Class materials and setup guide for Data Orchestration with Apache Airflow

Size: 5.86 KB - Last synced at: 23 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 40

RaphCodec/airflow-azure-starter

A starting point for production Airflow Deployment on an Azure VM running the LocalExecutor. Small team setup.

Language: Shell - Size: 1.09 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 0 - Forks: 0

kanhaiya-gupta/DataEngineering-metrify-smart-metering

Real-time smart meter data pipeline: Kafka + Snowflake + Airflow + dbt for scalable energy data processing with clean architecture and enterprise monitoring

Language: Python - Size: 800 KB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 0 - Forks: 0

Zerohertz/airflow-dags

🍃 [Apache Airflow] DAGs 🍃

Language: Python - Size: 144 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

George-Njuguna/Spotify-ETL-Pipeline

This is an ETL pipeline that uses Spotify API , Docker and Airflow

Language: Jupyter Notebook - Size: 2.05 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

rejsafranko/Continuous-Learning-Infrastructure

AWS infrastructure for deep learning model re-training.

Language: Python - Size: 307 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

apache/airflow-client-go

Apache Airflow - OpenApi Client for Go

Language: Go - Size: 543 KB - Last synced at: 3 days ago - Pushed at: 3 months ago - Stars: 214 - Forks: 26

AlvaroCavalcante/airflow-parse-bench

Stop creating bad DAGs! Use this tool to measure and compare the parse time of your DAGs, identify bottlenecks, and optimize your Airflow environment for better performance.

Language: Python - Size: 192 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 20 - Forks: 0

oxylabs/building-scraping-pipeline-apache-airflow

Using Apache Airflow to Build a Pipeline for Scraped Data

Language: Python - Size: 128 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 1

PRagan/netflix-subscription-data-pipeline-orchestration

A big data pipeline/analysis project. Orchestrated using Apache Airflow, the project also utilizes Kaggle, AWS S3, Glue, RedShift and Zoho Analytics to perform data scrubbing, ETL and visualization of Netflix subscription and title data.

Size: 1.95 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Narius2030/Setup-Big-Data-Services

Document of basic setup for Big Data services by Docker - Implement on premise (don't use cloud platform)

Language: Jupyter Notebook - Size: 28.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

Phadate/Wikipedia-football-data-engineering-pipeline

End-to-end data engineering pipeline that extracts Wikipedia data, processes it with Apache Airflow, stores in Azure Data Lake, and analyzes with Azure Synapse & Power BI

Language: Jupyter Notebook - Size: 203 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

rajibsalui/ETL-Airflow

It is a ETL pipeline with sheduled and orchestrated workflows using Apache Airflow. The raw data is ingested from a weather api, which is the processed and loaded into the PostgreSQL DB

Language: Python - Size: 199 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

astronomer/apache-airflow-providers-transfers

Language: Python - Size: 933 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 11 - Forks: 3

astronomer/astro-provider-databricks 📦

Orchestrate your Databricks notebooks in Airflow and execute them as Databricks Workflows

Language: Python - Size: 11.1 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 23 - Forks: 12

chiefnarx/Bakerite_Foods

Bakerite Foods is a data engineering project designed to orchestrate and automate data workflows for a food distribution company, using Apache Airflow and Azure Cloud Storage.

Language: Jupyter Notebook - Size: 217 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

1AyaNabil1/E-Commerce_ML_Data_Engineering_Pipeline_Snowflake

Built a full-stack data engineering pipeline on Snowflake to process and transform 10M+ daily e-commerce records for ML model training.

Language: Python - Size: 4.78 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 1

mdzaheerjk/Advanced_MLOPS_Project9_Iris-Flower-Classification2

🌸 Iris Flower Classification using End-to-End MLOps 🤖 Automated ML pipeline for predicting Iris species from measurements 🐳 Dockerized & Kubernetes-ready for scalable deployment 🌐 Flask web app for real-time species inference with modular code

Language: Jupyter Notebook - Size: 4.99 MB - Last synced at: 22 days ago - Pushed at: about 2 months ago - Stars: 7 - Forks: 0

mdzaheerjk/Advanced_Mlops_Project6_Australia_Weather_Rain_Predection

🌦️ Australia Weather Rain Prediction with Advanced MLOps 🤖 End-to-end ML pipeline for rain forecasting using Australian weather data 🐳 Dockerized and Kubernetes-ready for scalable deployment 🌐 Flask web app for real-time weather prediction with modular, reproducible code

Language: Jupyter Notebook - Size: 14.7 MB - Last synced at: 22 days ago - Pushed at: about 2 months ago - Stars: 7 - Forks: 0

mdzaheerjk/Advanced_Mlops_Project4_Custom_Guns_Object_Detection

🔫 Custom Guns Object Detection with MLOps 🤖 End-to-end pipeline: training, inference & experiment tracking 🗃️ DVC-powered data versioning and reproducible experiments 📊 Modular, configurable, and ready for research or security applications

Language: Jupyter Notebook - Size: 13.2 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 7 - Forks: 0

supakunz/Book-Revenue-Pipeline

A ready-to-use Docker-based template for data engineering projects, featuring a complete stack with Apache Airflow, Spark, and MinIO for building scalable data pipelines.

Language: Jupyter Notebook - Size: 2.28 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

meetzaveri29/etl-weather-pipeline

Automated ETL pipeline built with Apache Airflow that extracts real-time weather data from Open-Meteo API, transforms it into structured format, and loads it into PostgreSQL database. Features Docker containerization, AWS deployment support, and comprehensive monitoring. Perfect example of modern data engineering practices.

Language: Python - Size: 1.65 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

idealista/airflow-role

Ansible role to install Apache Airflow

Language: YAML - Size: 311 KB - Last synced at: 7 days ago - Pushed at: about 2 months ago - Stars: 85 - Forks: 52

tkp-archive/paperboy

A web frontend for scheduling Jupyter notebook reports

Language: Python - Size: 12.5 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 254 - Forks: 25

doitintl/doit-composer-airflow-training 📦

Getting started with Apache Airflow on Cloud Composer

Language: Python - Size: 4.24 MB - Last synced at: 11 days ago - Pushed at: over 3 years ago - Stars: 29 - Forks: 5

kaxil/airflowctl 📦

A CLI tool to streamline getting started with Apache Airflow™ and managing multiple Airflow projects

Language: Python - Size: 313 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 220 - Forks: 17

AmritPrakash3/Reddit-ETL-in-AWS-using-Airflow

Reddit ETL in AWS using Airflow is a full-stack data engineering project that builds a scalable ETL pipeline using cloud-based tools. It extracts data from Reddit via API, processes it with Apache Airflow, and leverages AWS services like S3, Glue, Athena, and Redshift for transformation, querying, and warehousing.

Language: Python - Size: 122 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

islambenab/Book-Revenue-Pipeline

📘 Build a complete data pipeline for book sales, from ingestion to visualization, using Docker, Airflow, Spark, and BigQuery for insightful analytics.

Language: Jupyter Notebook - Size: 1.96 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

airscholar/RedditDataEngineering

This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and services including Apache Airflow, Celery, PostgreSQL, Amazon S3, AWS Glue, Amazon Athena, and Amazon Redshift.

Language: Python - Size: 118 KB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 153 - Forks: 76

f-kuzey-edes-huyal/steam-sale-optimizer

An MLOps pipeline for optimizing game discount strategies using Steam reviews, tags, and competitor pricing. Designed for data-driven revenue maximization in the gaming industry.

Language: Python - Size: 43.8 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 4 - Forks: 0

astronomer/astro-sdk 📦

Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

Language: Python - Size: 7.54 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 375 - Forks: 50

supakunz/data-engineering-stack

A ready-to-use Docker-based template for data engineering projects, featuring a complete stack with Apache Airflow, Spark, and MinIO for building scalable data pipelines.

Language: Python - Size: 31.3 KB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

datarobot/airflow-provider-datarobot

DataRobot provider for Apache Airflow

Language: Python - Size: 705 KB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 31 - Forks: 5

royungar/ETL_Toll_Data_Pipeline_Project

Final project for IBM’s Data Engineering certificate (Course 8). ETL pipeline built with Apache Airflow and Bash to extract, transform, and consolidate toll data from CSV, TSV, and fixed-width files.

Language: Python - Size: 932 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

kevinborgesz/The-Data-Engineering-Academy

Materials from The Data Engineering Academy

Size: 1.95 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

sibyabin/blogs

Technology blogging website from Siby Abin. Talks about dataengineering, aws, spark, python, airflow and more

Language: SCSS - Size: 6.33 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

elyra-ai/pipeline-editor

Common pipeline-editor components used in different clients (e.g. Elyra application, Web browser extensions, etc)

Language: TypeScript - Size: 2.88 MB - Last synced at: 14 days ago - Pushed at: 2 months ago - Stars: 33 - Forks: 22

Kiran8053/Reddit-ETL-in-AWS-using-Airflow

Reddit ETL in AWS using Airflow is a full-stack data engineering project that builds a scalable ETL pipeline using cloud-based tools. It extracts data from Reddit via API, processes it with Apache Airflow, and leverages AWS services like S3, Glue, Athena, and Redshift for transformation, querying, and warehousing.

Language: Python - Size: 122 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

ambika-garg/PowerBI_Airflow_Plugin

Apache Airflow Plugin for a Power BI dataset refresh.

Language: Python - Size: 136 KB - Last synced at: 22 days ago - Pushed at: about 1 year ago - Stars: 10 - Forks: 2

depicted-candela/stateless-mind

Stateless Mind: Architecting Serverless Blueprints and Probabilistic Intelligence

Language: Python - Size: 504 KB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

muzomer/hetzner-apache-airflow

Apache Airflow in Hetzner Cloud

Language: HCL - Size: 11.7 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

astronomer/astro-provider-ray

This provider contains operators, decorators and triggers to send a ray job from an airflow task

Language: Python - Size: 665 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 20 - Forks: 5

joelsolaeche/E-Commerce-Data-Pipeline-ELT

Language: Jupyter Notebook - Size: 2.81 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

abrahamkoloboe27/E-Commerce-Data-Pipeline-And-Dashboard-With-Apache-Superset

Language: Python - Size: 10.5 MB - Last synced at: 12 days ago - Pushed at: 8 months ago - Stars: 7 - Forks: 2

Peter-Opapa/kafka_airflow_cassandra_pipeline

A comprehensive real-time data streaming pipeline built with Apache Airflow, Apache Kafka, and Apache Cassandra. Features automated ETL workflows, containerized deployment with Docker Compose, and live web UIs for monitoring. Includes quick-start scripts for easy setup and professional documentation.

Language: Python - Size: 378 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0