Topic: "data-pipelines"
AiDAPT-A/VisArchPy
pipelines for the extraction and processing of visuals from PDFs
Language: Python - Size: 3.79 MB - Last synced at: 15 days ago - Pushed at: 10 months ago - Stars: 4 - Forks: 1

Elkinmt19/airflow-master
This a repo that was created to learn more about Airflow and develop awesome data engineering projects. 🚀🚀
Language: Python - Size: 3.33 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 3

opendatadiscovery/odd-collector-gcp 📦
Open-source GCP metadata collector based on ODD Specification
Language: Python - Size: 188 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 0

DataDrivenGit/Music-Streaming-App-using-AWS-ETL
Implemented Data Warehouse, Data Lake on AWS and Data modeling with Postgres and Apache Cassandra, Also used Apache Airflow to create data pipeline
Language: Jupyter Notebook - Size: 725 KB - Last synced at: 5 months ago - Pushed at: almost 5 years ago - Stars: 4 - Forks: 3

dwhitena/pach-neon
An example Pachyderm ML pipeline using Nervana Neon
Language: Python - Size: 53.7 KB - Last synced at: over 1 year ago - Pushed at: about 8 years ago - Stars: 4 - Forks: 0

abeltavares/versioned-data-lakehouse
🌊 Git-like Version Control for Data with Nessie, Iceberg, and Spark
Language: Jupyter Notebook - Size: 3.95 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 3 - Forks: 2

kestra-io/data-engineering-zoomcamp
Code for the Data Engineering Zoomcamp course
Size: 470 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 3 - Forks: 1

rcorrero/light-pipe
A high-level syntax for data pipelines, designed to make pipeline development quick and painless.
Language: Python - Size: 1.52 MB - Last synced at: 17 days ago - Pushed at: 9 months ago - Stars: 3 - Forks: 1

zkan/introduction-to-data-pipelines-and-apache-airflow
Introduction to Data Pipelines and Apache Airflow
Language: Python - Size: 134 KB - Last synced at: 19 days ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 9

AnanthaRajuC/DataPractitioner
Data Practitioner
Language: Python - Size: 1010 KB - Last synced at: 11 months ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

Ardemius/big-data-resources
Repo to store all my Data ressources : "big" ones, data pipelines, data management, all of them 😉
Size: 8.28 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 2

StrictlySkyler/harbormaster Fork of luzlab/harbormaster-apache 📦
A framework for microservices
Language: JavaScript - Size: 1.82 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 5

projectmesadata/cropyield
Creates a data pipeline from the Famine Land Data Assimilation DataSet (FLDAS) to seed model terrain and assess the potential crop yield for a variety of crops.
Language: Jupyter Notebook - Size: 134 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 3

BogdanFloris/detecting-and-addressing-change
Code for my Master Thesis: How to detect and address changes in machine learning based data pipelines
Language: Python - Size: 151 KB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

bjanesh/odi-tools
A suite of tools written in Pyraf, Astropy, Scipy, and Numpy to process individual QuickReduced images into single stacked images using a set of "best practices" for ODI data.
Language: Python - Size: 2.39 MB - Last synced at: about 23 hours ago - Pushed at: almost 5 years ago - Stars: 3 - Forks: 2

FedericoSerini/DEND-Project-5-Data-Pipelines
Project 5 - Data Engineering Nanodegree
Language: Python - Size: 4.88 KB - Last synced at: 4 days ago - Pushed at: almost 6 years ago - Stars: 3 - Forks: 4

todofixthis/filters
🤔 What if we took the UNIX philosophy and applied it to input validation?
Language: Python - Size: 970 KB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 2 - Forks: 3

The-Swarm-Corporation/Custom-Swarms-Spec-Template
Build your dream AI agent swarm with enterprise-grade reliability and scalability. This repository contains our official specification template for custom swarm development using the powerful Swarms Framework.
Size: 79.1 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

leotech-dev/leoflow
A set of plugins (mappers, sinks, etc.) for Numaflow pipelines
Language: Go - Size: 11.7 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

itsame-mcl/data-pypeline
Pure Python 3 data wrangling tools with support for pipelines
Language: Python - Size: 24.9 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 1

santiagortiiz/Snowflake-Data-Pipelines
EPAM's Snowflake hands-on lab. We built a pipeline to read and load data from S3 into Snowflake, developed an ETL workflow to clean the data and stored it in a data warehouse with the 3NF and Star schemas for data mart analysis.
Language: Jupyter Notebook - Size: 30.6 MB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

zpencerguy/superdoppler
Data orchestration repo with Docker deployment
Language: Python - Size: 38.1 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

allanchua101/ipynta
Rapidly build image processing pipelines
Language: Python - Size: 437 KB - Last synced at: 9 months ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 1

zkan/building-data-pipelines-with-apache-airflow
Building Data Pipelines with Apache Airflow
Language: Dockerfile - Size: 10.6 MB - Last synced at: 19 days ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 11

CogStack/annotations-ingester
Send text annotations back to ElasticSearch
Language: Python - Size: 117 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 2

ahayasic/apache-airflow-in-a-nutshell
A set of markdown files explaining about Apache Airflow, best practices and recipes.
Language: Python - Size: 693 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 2 - Forks: 0

paulokuong/airflow-run
Quick way to deploy Airflow Multi-Node Cluster (a.k.a. Airflow Celery Executor Setup)
Language: Python - Size: 147 KB - Last synced at: 22 days ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 1

The-AI-Alliance/dpk-alliance
A simplification and extension of the Data Prep Kit project
Language: Python - Size: 552 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0

heymumford/Samstraumr
Samstraumr is a Java-based framework that implements systems theory concepts in software architecture for building resilient, adaptive systems and simulations with self-healing capabilities.
Language: Java - Size: 18 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1 - Forks: 0

stevehoober254/dataengineer-portfolio
📊 End-to-end ETL pipelines, Airflow DAGs, notebook-driven analytics & data warehousing
Size: 6.84 KB - Last synced at: 15 days ago - Pushed at: 16 days ago - Stars: 1 - Forks: 0

DataForgeOpenAIHub/mlops-credit-card-fraud-detection-end-to-end
End to End Machine Learning MLOps Project for Credit Card Fraud Detection using Ensemble Models, Data and Model Versioning through DVC, Github Actions, and Deployment
Language: Jupyter Notebook - Size: 2.53 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 1 - Forks: 1

raj-maharajwala/mlops-credit-card-fraud-detection-end-to-end Fork of DataForgeOpenAIHub/mlops-credit-card-fraud-detection-end-to-end
End to End Machine Learning MLOps Project for Credit Card Fraud Detection using Ensemble Models, Data and Model Versioning through DVC, Github Actions, and Deployment
Language: Jupyter Notebook - Size: 2.36 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 1 - Forks: 0

caesarmario/weather-data-engineering-pipeline
This repository showcases a complete Python-based ETL (Extract, Transform, Load) data pipeline designed to process, validate, and analyze weather data for multiple cities. The project demonstrates a structured approach to handling weather data, focusing on data accuracy, transformation, and insights generation.
Language: Python - Size: 588 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

allamiro/Data-Pipelines
Every thing about designing installing and implementing data pipelines to include kafka zookeeper hadoop If you enjoy my content please consider supporting what I do Thank you.
Language: Jinja - Size: 4.48 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

ProyectoRespira/data_retriever
Data pipeline and inference for Proyecto Respira
Language: Python - Size: 2.12 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

YSayaovong/Refonte-System-Redesigns
Collaborated on building scalable data pipelines, performing ETL processes, and optimizing database performance to support data-driven decision-making
Language: Jupyter Notebook - Size: 771 KB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

nbigot/ministream
Ministream is a small, stand-alone, real-time event messaging streaming server
Language: Go - Size: 269 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 1

rcgsheffield/airbods
AIRBODS data pipelines and storage
Language: Python - Size: 274 KB - Last synced at: 14 days ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

PinsaraPerera/Customer_satisfaction_model_deployment
Machine Learning Operation pipelines for train and monitoring Customer satisfaction model. Created using ZenML framework.
Language: Python - Size: 14.3 MB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

cybergeekgyan/Data-Engineering-Portfolio
Data Engineering portfolio projects, resources used to study data tools...
Language: Jupyter Notebook - Size: 2.92 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

scottbarnesg/bowline
Bowline: Easily build performant data stream processing pipelines in Python.
Language: Python - Size: 64.5 KB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

yashrajjain726/Weather-Visibility-Prediction
This is a Project which uses live weather data using API, and predicts visibility in the weather.
Language: Jupyter Notebook - Size: 6.36 MB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

vanderschaarlab/temporai-mivdp
TemporAI-MIVDP: Adaptation of MIMIC-IV-Data-Pipeline for TemporAI
Language: Python - Size: 1.85 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

svidovich/python-experiments
Anything that doesn't fit anywhere else
Language: Python - Size: 941 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

Taiyo-ai/pt-mesh-pipeline
Use this template repository to write projects and tenders data ingestion pipelines
Language: Python - Size: 111 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 145

scott-diprose/dtm-lib-dotnet
Basic .NET library utilising scott-diprose/dtm-schema
Language: C# - Size: 154 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

mid-atlantic-applied-sciences/legendary-octo-journey
Enforce code review before adding code
Size: 64.5 KB - Last synced at: 1 day ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 1

jmoussa/go-sentitweet
CLI Application holding a sentiment analysis data (Twitter tweets) pipeline with its own Web API to query results in the database. Written entirely in Go.
Language: Go - Size: 13.4 MB - Last synced at: about 2 months ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 1

scott-diprose/dtm-schema
Data Transfer Metadata. Object model for standardising the structure of metadata applicable to building data transfers.
Size: 51.8 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

AbdullahMu/Data-Pipelines-with-Airflow
Schedule, automate, and monitor data pipelines using Apache Airflow. Run data quality checks, track data lineage, and work with data pipelines in production.
Language: Python - Size: 52.7 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

NYCPlanning/ceqr-app-data-archive 📦
(DEPRECATED)data pipelines for CEQR app, managed by data engineering
Language: Python - Size: 210 KB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 1

aquemy/DOLAP_2019_supplementary_material
Supplementary material for DOLAP 2019 submission
Size: 5.04 MB - Last synced at: about 1 month ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

GuerrillaAnalytics/proj001_lfb
Example training project for Guerrilla Analytics ways of working
Language: Jupyter Notebook - Size: 993 KB - Last synced at: 5 months ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 2

ahmedd38/dataengineer-portfolio
📊 End-to-end ETL pipelines, Airflow DAGs, notebook-driven analytics & data warehousing
Size: 7.81 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

jesuserro/books-etl-pipeline
The Books ETL Pipeline is a data engineering project that extracts, transforms, and loads data from Goodreads and other sources to analyze book authors and their works. It leverages tools like Airflow for orchestration, MySQL for data storage, and Grafana for visualization.
Language: Jupyter Notebook - Size: 470 KB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

sanjay-k08/Python-for-GCP-Interact-with-Google-Cloud-Using-Python
Python For GCP is a project aimed at simplifying the interaction with Google Cloud Platform (GCP) services using Python. This repository provides code examples and scripts that help you manage and automate various GCP resources such as BigQuery, Cloud Storage, BigTable, Compute Engine, and more entirely through Python.
Language: Python - Size: 37.1 KB - Last synced at: 13 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

hlan22/2025-02-27-pipelines
Learning about Data Pipelines
Language: TeX - Size: 195 KB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

aust21/skill-gap-analyzer
Online skill analyzer web app
Language: Python - Size: 15.6 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 0 - Forks: 0

apelullo/yelp_health_data_curation_ops
An AWS-based data pipeline to extract, process, store, and monitor Yelp "health-related" facility data in support of ongoing health system initiatives.
Language: Jupyter Notebook - Size: 1.17 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

apelullo/twitter_covid_stream_processing_ops
An AWS-based data pipeline to collect, process, store, and monitor Twitter streaming data thoughout the COVID-19 pandemic in support of local, regional, and national public health initiatives.
Language: Jupyter Notebook - Size: 117 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

sahil-172002/CSV-to-PostgreSQL-Data-Pipeline
Data pipelines are essential components of modern data engineering. Whether you're working with small datasets or handling massive data warehouses, knowing how to efficiently move data between different systems is crucial. In this guide
Language: TypeScript - Size: 58.6 KB - Last synced at: 30 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Javid912/Stock-Market-Analytics-data-pipline
A production-grade ETL pipeline for processing financial market data using Apache Airflow, dbt, and PostgreSQL.
Language: Python - Size: 5.41 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

ShottzHT/Healthcare-Analysis
Analyze healthcare data to identify key trends, risk factors, and actionable insights using Tableau dashboards and Python preprocessing. Enhance healthcare decision-making with interactive visualizations and data-driven approaches.
Size: 1.95 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

lexiortiz/WiBD-DataCamp
Notes, exercises, and projects from the WiBD 2024/2025 DataCamp Scholarship.
Language: Jupyter Notebook - Size: 33.2 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

calebsuminkim/airflow
[인프런] 에어플로우 마스터 강의 실습용 리포지토리
Language: Python - Size: 90.8 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

BauplanLabs/wap-with-bauplan-and-temporal
A reference implementation of Write-Audit-Publish over the lakehouse in pure Python
Language: Python - Size: 200 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

mavaji/free-monad
Language: Scala - Size: 275 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

armahdavi/analytics_statistics_ML_plotting_dust_extraction_hvac_filters_ph2
PhD Technical Paper 1 - Phase 2 - Mahdavi & Siegel (2020) (Aerosol Science & Technology; AS&T) - Sharing all the data pipelines, processing codes, descriptive statistics, statistical modellings, and plotting/visualizations - Project Miestone: 2017 - 2020 - Full-length article is available
Language: Jupyter Notebook - Size: 414 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

wdonne/pincette-mongo-streams
JSON Streaming With Mongo Streams
Language: Java - Size: 128 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 1

mehanix/dhrw
🎢 IaaS visual editor to create & deploy data processing pipelines - python, rmq, react, meteorjs
Language: JavaScript - Size: 1.88 MB - Last synced at: 22 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

GADES-DATAENG/webinar
Code, scripts, and resources for the Data Engineering Fundamentals Course Webinar, covering Python, data pipelines, Apache Airflow, and more.
Language: Python - Size: 26.9 MB - Last synced at: 26 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

CloudFormations/Training.DataIntegration
Training content for course delegates.
Language: TSQL - Size: 29.1 MB - Last synced at: 22 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 11

rohith42/bird-finderz
End-to-end deep learning system enabling anyone to classify bird species efficiently
Language: Python - Size: 111 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

apache/airflow-publish
Publishing PyPI packages for Apache Airflow
Language: Python - Size: 69.3 KB - Last synced at: about 11 hours ago - Pushed at: 4 months ago - Stars: 0 - Forks: 1

Blacksujit/Problems-I-have-Faced-In-My-Journey-OF-Programming
This repository contains the issues and errors which i have faced in my Prgramming and Machine Learning and Deep learning Journey
Language: Jupyter Notebook - Size: 11 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

runodp/dagster-odp
A configuration-driven framework for building Dagster pipelines that enables teams to create and manage data workflows using YAML/JSON instead of code
Language: Python - Size: 849 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

siddharth-nandagopal/billionaires-rag-query
Billionaires RAG Query uses LLMs and a RAG framework to analyze the world's billionaires list. Extracts tabular data from PDFs, converts to multiple formats, and enables precise queries about net worth, age, and more. Integrates with Poetry and asdf for easy setup and management.
Language: Python - Size: 707 KB - Last synced at: 20 days ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

farukalamai/tomato-leaf-diseases-ditection
tomato leaf diseases ditection using yolov8 and yolov5
Language: Jupyter Notebook - Size: 8.97 MB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

KayvanShah1/usc-dsci560-dspp-sp24
USC DSCI 560 - Data Science Professional Practicum - Spring 2024 - Prof. Young Cho
Language: Python - Size: 50.1 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 1

datlin-org/sigzag
Sigzag is an observability utility and backend service for datlin and is used to monitor, sign and log data pipeline transactions.
Language: Go - Size: 143 KB - Last synced at: 2 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

rafaelvargas/bytebridge
A data tool designed to move data seamlessly between various sources and destinations.
Language: Python - Size: 46.9 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 1

TheODDYSEY/Scikit-Pipeline
Bank Customer Churn Prediction Project 💰
Language: Jupyter Notebook - Size: 6.3 MB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

PinsaraPerera/MLOps_with_mlflow
This is advance machine learning operation pipelines integrated with MLflow to monitor artifacts and metrices. Deployed in AWS via CICD GitHub Actions.
Language: CSS - Size: 108 KB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

srenegado/paintings-data
A Python ETL pipeline with a Postgres data warehouse for modeling art inventory.
Language: Python - Size: 528 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

jbossdemocentral/edge-to-cloud-data-pipelines-demo
Solution Pattern: Edge to Core Data Pipelines for AI/ML
Language: JavaScript - Size: 34.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 1

thecodemancer/Apache-Beam
🔥👨💻 Build Big data pipelines with Apache Beam in any language and run it via Spark, Flink, GCP (Google Cloud Dataflow).
Language: Jupyter Notebook - Size: 321 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

KyleZrey/data-pipeline
Creation of data pipeline using Jupyter Notebook, PostgreSQL, and Apache Airflow.
Language: Jupyter Notebook - Size: 9.74 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

tara-nguyen/modern-data-architecture
Follow along with materials in the book "Modern Data Architectures with Python: A practical guide to building and deploying data pipelines, data warehouses and data lakes" (Lipp, 2023)
Language: Jupyter Notebook - Size: 33.2 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Sibusiso-Gumede/supermarket-scraper
A data extraction program that is a component of a ETL data pipeline. The program scrapes product promotion data from supermarket websites.
Language: Python - Size: 465 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

mariaafara/airflow-pipelines
This repository hosts a collection of exercises for building pipelines and DAGs using Apache Airflow, along with a submodule for an Airflow server that can be used to test and deploy the pipelines.
Language: Python - Size: 532 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

BinariesGoalls/IBM-Data-Engineering-Professional-Certificate
This is a repository to document the entire process and learning throughout the Coursera's IBM Data Engineering Professional Certificate program.
Size: 381 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

gabrielbazan/kafka_pipeline
An example of how to build a data processing piepline with Apache Kafka, NGINX, Python, FastAPI, Docker, and MongoDB.
Language: Python - Size: 1.09 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

vandenn/dagster-prio-dynamic-map
A reference repository for implementing Dynamic Mapping and Op Prioritization with Dagster.
Language: Python - Size: 65.4 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

splitgraph/seafowl-dagster-demo
An example project demonstrating how to submit data to Seafowl from a dagster job.
Language: Python - Size: 9.77 KB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

bytes1inger/Beatlytica
This project implements a real-time event streaming pipeline for a music streaming service, inspired by Spotify Wrapped and Billboard charts. The pipeline is powered by Apache Airflow, Apache Kafka, dbt, Docker, GCP, Spark-Streaming, and Terraform.
Language: Python - Size: 86.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Fozan-Talat/DTC-Data-Engineering-Zoomcamp
This repository contains homework solutions and course material for 10 weeks data engineering zoomcamp by DataTalksClub.
Language: Jupyter Notebook - Size: 39.1 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

ZuchniakK/CryptoDataProcessing
In the series of notebooks included in this repository, I present the process of acquiring, exploring, cleaning and normalizing data, generating additional features, and creating a dataset containing reasonable X and Y batches for ML.
Language: Jupyter Notebook - Size: 4.4 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

eldor-fozilov/first-dance-with-MLOps
Knowing how to deploy models into production is as important as building them!
Language: Jupyter Notebook - Size: 73.2 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 1

panastasiadis/etl-microservices-demo
This repository contains a demo application that showcases the ETL (extract, transform, load) process using Apache Kafka, MongoDB, MySQL, and Neo4j to collect, store, and analyze product transactions data.
Language: Python - Size: 6.84 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

mpolinowski/apache-airflow-intro
Introduction to Apache Airflow
Language: Python - Size: 9.77 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0
