An open API service providing repository metadata for many open source software ecosystems.

Topic: "data-pipelines"

AiDAPT-A/VisArchPy

pipelines for the extraction and processing of visuals from PDFs

Language: Python - Size: 3.79 MB - Last synced at: 15 days ago - Pushed at: 10 months ago - Stars: 4 - Forks: 1

Elkinmt19/airflow-master

This a repo that was created to learn more about Airflow and develop awesome data engineering projects. 🚀🚀

Language: Python - Size: 3.33 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 3

opendatadiscovery/odd-collector-gcp 📦

Open-source GCP metadata collector based on ODD Specification

Language: Python - Size: 188 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 0

DataDrivenGit/Music-Streaming-App-using-AWS-ETL

Implemented Data Warehouse, Data Lake on AWS and Data modeling with Postgres and Apache Cassandra, Also used Apache Airflow to create data pipeline

Language: Jupyter Notebook - Size: 725 KB - Last synced at: 5 months ago - Pushed at: almost 5 years ago - Stars: 4 - Forks: 3

dwhitena/pach-neon

An example Pachyderm ML pipeline using Nervana Neon

Language: Python - Size: 53.7 KB - Last synced at: over 1 year ago - Pushed at: about 8 years ago - Stars: 4 - Forks: 0

abeltavares/versioned-data-lakehouse

🌊 Git-like Version Control for Data with Nessie, Iceberg, and Spark

Language: Jupyter Notebook - Size: 3.95 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 3 - Forks: 2

kestra-io/data-engineering-zoomcamp

Code for the Data Engineering Zoomcamp course

Size: 470 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 3 - Forks: 1

rcorrero/light-pipe

A high-level syntax for data pipelines, designed to make pipeline development quick and painless.

Language: Python - Size: 1.52 MB - Last synced at: 17 days ago - Pushed at: 9 months ago - Stars: 3 - Forks: 1

zkan/introduction-to-data-pipelines-and-apache-airflow

Introduction to Data Pipelines and Apache Airflow

Language: Python - Size: 134 KB - Last synced at: 19 days ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 9

AnanthaRajuC/DataPractitioner

Data Practitioner

Language: Python - Size: 1010 KB - Last synced at: 11 months ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

Ardemius/big-data-resources

Repo to store all my Data ressources : "big" ones, data pipelines, data management, all of them 😉

Size: 8.28 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 2

StrictlySkyler/harbormaster Fork of luzlab/harbormaster-apache 📦

A framework for microservices

Language: JavaScript - Size: 1.82 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 5

projectmesadata/cropyield

Creates a data pipeline from the Famine Land Data Assimilation DataSet (FLDAS) to seed model terrain and assess the potential crop yield for a variety of crops.

Language: Jupyter Notebook - Size: 134 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 3

BogdanFloris/detecting-and-addressing-change

Code for my Master Thesis: How to detect and address changes in machine learning based data pipelines

Language: Python - Size: 151 KB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

bjanesh/odi-tools

A suite of tools written in Pyraf, Astropy, Scipy, and Numpy to process individual QuickReduced images into single stacked images using a set of "best practices" for ODI data.

Language: Python - Size: 2.39 MB - Last synced at: about 23 hours ago - Pushed at: almost 5 years ago - Stars: 3 - Forks: 2

FedericoSerini/DEND-Project-5-Data-Pipelines

Project 5 - Data Engineering Nanodegree

Language: Python - Size: 4.88 KB - Last synced at: 4 days ago - Pushed at: almost 6 years ago - Stars: 3 - Forks: 4

todofixthis/filters

🤔 What if we took the UNIX philosophy and applied it to input validation?

Language: Python - Size: 970 KB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 2 - Forks: 3

The-Swarm-Corporation/Custom-Swarms-Spec-Template

Build your dream AI agent swarm with enterprise-grade reliability and scalability. This repository contains our official specification template for custom swarm development using the powerful Swarms Framework.

Size: 79.1 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

leotech-dev/leoflow

A set of plugins (mappers, sinks, etc.) for Numaflow pipelines

Language: Go - Size: 11.7 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

itsame-mcl/data-pypeline

Pure Python 3 data wrangling tools with support for pipelines

Language: Python - Size: 24.9 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 1

santiagortiiz/Snowflake-Data-Pipelines

EPAM's Snowflake hands-on lab. We built a pipeline to read and load data from S3 into Snowflake, developed an ETL workflow to clean the data and stored it in a data warehouse with the 3NF and Star schemas for data mart analysis.

Language: Jupyter Notebook - Size: 30.6 MB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

zpencerguy/superdoppler

Data orchestration repo with Docker deployment

Language: Python - Size: 38.1 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

allanchua101/ipynta

Rapidly build image processing pipelines

Language: Python - Size: 437 KB - Last synced at: 9 months ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 1

zkan/building-data-pipelines-with-apache-airflow

Building Data Pipelines with Apache Airflow

Language: Dockerfile - Size: 10.6 MB - Last synced at: 19 days ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 11

CogStack/annotations-ingester

Send text annotations back to ElasticSearch

Language: Python - Size: 117 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 2

ahayasic/apache-airflow-in-a-nutshell

A set of markdown files explaining about Apache Airflow, best practices and recipes.

Language: Python - Size: 693 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 2 - Forks: 0

paulokuong/airflow-run

Quick way to deploy Airflow Multi-Node Cluster (a.k.a. Airflow Celery Executor Setup)

Language: Python - Size: 147 KB - Last synced at: 22 days ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 1

The-AI-Alliance/dpk-alliance

A simplification and extension of the Data Prep Kit project

Language: Python - Size: 552 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0

heymumford/Samstraumr

Samstraumr is a Java-based framework that implements systems theory concepts in software architecture for building resilient, adaptive systems and simulations with self-healing capabilities.

Language: Java - Size: 18 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1 - Forks: 0

stevehoober254/dataengineer-portfolio

📊 End-to-end ETL pipelines, Airflow DAGs, notebook-driven analytics & data warehousing

Size: 6.84 KB - Last synced at: 15 days ago - Pushed at: 16 days ago - Stars: 1 - Forks: 0

DataForgeOpenAIHub/mlops-credit-card-fraud-detection-end-to-end

End to End Machine Learning MLOps Project for Credit Card Fraud Detection using Ensemble Models, Data and Model Versioning through DVC, Github Actions, and Deployment

Language: Jupyter Notebook - Size: 2.53 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 1 - Forks: 1

raj-maharajwala/mlops-credit-card-fraud-detection-end-to-end Fork of DataForgeOpenAIHub/mlops-credit-card-fraud-detection-end-to-end

End to End Machine Learning MLOps Project for Credit Card Fraud Detection using Ensemble Models, Data and Model Versioning through DVC, Github Actions, and Deployment

Language: Jupyter Notebook - Size: 2.36 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 1 - Forks: 0

caesarmario/weather-data-engineering-pipeline

This repository showcases a complete Python-based ETL (Extract, Transform, Load) data pipeline designed to process, validate, and analyze weather data for multiple cities. The project demonstrates a structured approach to handling weather data, focusing on data accuracy, transformation, and insights generation.

Language: Python - Size: 588 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

allamiro/Data-Pipelines

Every thing about designing installing and implementing data pipelines to include kafka zookeeper hadoop If you enjoy my content please consider supporting what I do Thank you.

Language: Jinja - Size: 4.48 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

ProyectoRespira/data_retriever

Data pipeline and inference for Proyecto Respira

Language: Python - Size: 2.12 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

YSayaovong/Refonte-System-Redesigns

Collaborated on building scalable data pipelines, performing ETL processes, and optimizing database performance to support data-driven decision-making

Language: Jupyter Notebook - Size: 771 KB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

nbigot/ministream

Ministream is a small, stand-alone, real-time event messaging streaming server

Language: Go - Size: 269 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 1

rcgsheffield/airbods

AIRBODS data pipelines and storage

Language: Python - Size: 274 KB - Last synced at: 14 days ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

PinsaraPerera/Customer_satisfaction_model_deployment

Machine Learning Operation pipelines for train and monitoring Customer satisfaction model. Created using ZenML framework.

Language: Python - Size: 14.3 MB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

cybergeekgyan/Data-Engineering-Portfolio

Data Engineering portfolio projects, resources used to study data tools...

Language: Jupyter Notebook - Size: 2.92 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

scottbarnesg/bowline

Bowline: Easily build performant data stream processing pipelines in Python.

Language: Python - Size: 64.5 KB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

yashrajjain726/Weather-Visibility-Prediction

This is a Project which uses live weather data using API, and predicts visibility in the weather.

Language: Jupyter Notebook - Size: 6.36 MB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

vanderschaarlab/temporai-mivdp

TemporAI-MIVDP: Adaptation of MIMIC-IV-Data-Pipeline for TemporAI

Language: Python - Size: 1.85 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

svidovich/python-experiments

Anything that doesn't fit anywhere else

Language: Python - Size: 941 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

Taiyo-ai/pt-mesh-pipeline

Use this template repository to write projects and tenders data ingestion pipelines

Language: Python - Size: 111 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 145

scott-diprose/dtm-lib-dotnet

Basic .NET library utilising scott-diprose/dtm-schema

Language: C# - Size: 154 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

mid-atlantic-applied-sciences/legendary-octo-journey

Enforce code review before adding code

Size: 64.5 KB - Last synced at: 1 day ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 1

jmoussa/go-sentitweet

CLI Application holding a sentiment analysis data (Twitter tweets) pipeline with its own Web API to query results in the database. Written entirely in Go.

Language: Go - Size: 13.4 MB - Last synced at: about 2 months ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 1

scott-diprose/dtm-schema

Data Transfer Metadata. Object model for standardising the structure of metadata applicable to building data transfers.

Size: 51.8 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

AbdullahMu/Data-Pipelines-with-Airflow

Schedule, automate, and monitor data pipelines using Apache Airflow. Run data quality checks, track data lineage, and work with data pipelines in production.

Language: Python - Size: 52.7 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

NYCPlanning/ceqr-app-data-archive 📦

(DEPRECATED)data pipelines for CEQR app, managed by data engineering

Language: Python - Size: 210 KB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 1

aquemy/DOLAP_2019_supplementary_material

Supplementary material for DOLAP 2019 submission

Size: 5.04 MB - Last synced at: about 1 month ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

GuerrillaAnalytics/proj001_lfb

Example training project for Guerrilla Analytics ways of working

Language: Jupyter Notebook - Size: 993 KB - Last synced at: 5 months ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 2

ahmedd38/dataengineer-portfolio

📊 End-to-end ETL pipelines, Airflow DAGs, notebook-driven analytics & data warehousing

Size: 7.81 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

jesuserro/books-etl-pipeline

The Books ETL Pipeline is a data engineering project that extracts, transforms, and loads data from Goodreads and other sources to analyze book authors and their works. It leverages tools like Airflow for orchestration, MySQL for data storage, and Grafana for visualization.

Language: Jupyter Notebook - Size: 470 KB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

sanjay-k08/Python-for-GCP-Interact-with-Google-Cloud-Using-Python

Python For GCP is a project aimed at simplifying the interaction with Google Cloud Platform (GCP) services using Python. This repository provides code examples and scripts that help you manage and automate various GCP resources such as BigQuery, Cloud Storage, BigTable, Compute Engine, and more entirely through Python.

Language: Python - Size: 37.1 KB - Last synced at: 13 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

hlan22/2025-02-27-pipelines

Learning about Data Pipelines

Language: TeX - Size: 195 KB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

aust21/skill-gap-analyzer

Online skill analyzer web app

Language: Python - Size: 15.6 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 0 - Forks: 0

apelullo/yelp_health_data_curation_ops

An AWS-based data pipeline to extract, process, store, and monitor Yelp "health-related" facility data in support of ongoing health system initiatives.

Language: Jupyter Notebook - Size: 1.17 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

apelullo/twitter_covid_stream_processing_ops

An AWS-based data pipeline to collect, process, store, and monitor Twitter streaming data thoughout the COVID-19 pandemic in support of local, regional, and national public health initiatives.

Language: Jupyter Notebook - Size: 117 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

sahil-172002/CSV-to-PostgreSQL-Data-Pipeline

Data pipelines are essential components of modern data engineering. Whether you're working with small datasets or handling massive data warehouses, knowing how to efficiently move data between different systems is crucial. In this guide

Language: TypeScript - Size: 58.6 KB - Last synced at: 30 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Javid912/Stock-Market-Analytics-data-pipline

A production-grade ETL pipeline for processing financial market data using Apache Airflow, dbt, and PostgreSQL.

Language: Python - Size: 5.41 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

ShottzHT/Healthcare-Analysis

Analyze healthcare data to identify key trends, risk factors, and actionable insights using Tableau dashboards and Python preprocessing. Enhance healthcare decision-making with interactive visualizations and data-driven approaches.

Size: 1.95 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

lexiortiz/WiBD-DataCamp

Notes, exercises, and projects from the WiBD 2024/2025 DataCamp Scholarship.

Language: Jupyter Notebook - Size: 33.2 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

calebsuminkim/airflow

[인프런] 에어플로우 마스터 강의 실습용 리포지토리

Language: Python - Size: 90.8 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

BauplanLabs/wap-with-bauplan-and-temporal

A reference implementation of Write-Audit-Publish over the lakehouse in pure Python

Language: Python - Size: 200 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

mavaji/free-monad

Language: Scala - Size: 275 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

armahdavi/analytics_statistics_ML_plotting_dust_extraction_hvac_filters_ph2

PhD Technical Paper 1 - Phase 2 - Mahdavi & Siegel (2020) (Aerosol Science & Technology; AS&T) - Sharing all the data pipelines, processing codes, descriptive statistics, statistical modellings, and plotting/visualizations - Project Miestone: 2017 - 2020 - Full-length article is available

Language: Jupyter Notebook - Size: 414 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

wdonne/pincette-mongo-streams

JSON Streaming With Mongo Streams

Language: Java - Size: 128 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 1

mehanix/dhrw

🎢 IaaS visual editor to create & deploy data processing pipelines - python, rmq, react, meteorjs

Language: JavaScript - Size: 1.88 MB - Last synced at: 22 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

GADES-DATAENG/webinar

Code, scripts, and resources for the Data Engineering Fundamentals Course Webinar, covering Python, data pipelines, Apache Airflow, and more.

Language: Python - Size: 26.9 MB - Last synced at: 26 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

CloudFormations/Training.DataIntegration

Training content for course delegates.

Language: TSQL - Size: 29.1 MB - Last synced at: 22 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 11

rohith42/bird-finderz

End-to-end deep learning system enabling anyone to classify bird species efficiently

Language: Python - Size: 111 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

apache/airflow-publish

Publishing PyPI packages for Apache Airflow

Language: Python - Size: 69.3 KB - Last synced at: about 11 hours ago - Pushed at: 4 months ago - Stars: 0 - Forks: 1

Blacksujit/Problems-I-have-Faced-In-My-Journey-OF-Programming

This repository contains the issues and errors which i have faced in my Prgramming and Machine Learning and Deep learning Journey

Language: Jupyter Notebook - Size: 11 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

runodp/dagster-odp

A configuration-driven framework for building Dagster pipelines that enables teams to create and manage data workflows using YAML/JSON instead of code

Language: Python - Size: 849 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

siddharth-nandagopal/billionaires-rag-query

Billionaires RAG Query uses LLMs and a RAG framework to analyze the world's billionaires list. Extracts tabular data from PDFs, converts to multiple formats, and enables precise queries about net worth, age, and more. Integrates with Poetry and asdf for easy setup and management.

Language: Python - Size: 707 KB - Last synced at: 20 days ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

farukalamai/tomato-leaf-diseases-ditection

tomato leaf diseases ditection using yolov8 and yolov5

Language: Jupyter Notebook - Size: 8.97 MB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

KayvanShah1/usc-dsci560-dspp-sp24

USC DSCI 560 - Data Science Professional Practicum - Spring 2024 - Prof. Young Cho

Language: Python - Size: 50.1 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 1

datlin-org/sigzag

Sigzag is an observability utility and backend service for datlin and is used to monitor, sign and log data pipeline transactions.

Language: Go - Size: 143 KB - Last synced at: 2 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

rafaelvargas/bytebridge

A data tool designed to move data seamlessly between various sources and destinations.

Language: Python - Size: 46.9 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 1

TheODDYSEY/Scikit-Pipeline

Bank Customer Churn Prediction Project 💰

Language: Jupyter Notebook - Size: 6.3 MB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

PinsaraPerera/MLOps_with_mlflow

This is advance machine learning operation pipelines integrated with MLflow to monitor artifacts and metrices. Deployed in AWS via CICD GitHub Actions.

Language: CSS - Size: 108 KB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

srenegado/paintings-data

A Python ETL pipeline with a Postgres data warehouse for modeling art inventory.

Language: Python - Size: 528 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

jbossdemocentral/edge-to-cloud-data-pipelines-demo

Solution Pattern: Edge to Core Data Pipelines for AI/ML

Language: JavaScript - Size: 34.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 1

thecodemancer/Apache-Beam

🔥👨‍💻 Build Big data pipelines with Apache Beam in any language and run it via Spark, Flink, GCP (Google Cloud Dataflow).

Language: Jupyter Notebook - Size: 321 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

KyleZrey/data-pipeline

Creation of data pipeline using Jupyter Notebook, PostgreSQL, and Apache Airflow.

Language: Jupyter Notebook - Size: 9.74 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

tara-nguyen/modern-data-architecture

Follow along with materials in the book "Modern Data Architectures with Python: A practical guide to building and deploying data pipelines, data warehouses and data lakes" (Lipp, 2023)

Language: Jupyter Notebook - Size: 33.2 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Sibusiso-Gumede/supermarket-scraper

A data extraction program that is a component of a ETL data pipeline. The program scrapes product promotion data from supermarket websites.

Language: Python - Size: 465 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

mariaafara/airflow-pipelines

This repository hosts a collection of exercises for building pipelines and DAGs using Apache Airflow, along with a submodule for an Airflow server that can be used to test and deploy the pipelines.

Language: Python - Size: 532 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

BinariesGoalls/IBM-Data-Engineering-Professional-Certificate

This is a repository to document the entire process and learning throughout the Coursera's IBM Data Engineering Professional Certificate program.

Size: 381 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

gabrielbazan/kafka_pipeline

An example of how to build a data processing piepline with Apache Kafka, NGINX, Python, FastAPI, Docker, and MongoDB.

Language: Python - Size: 1.09 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

vandenn/dagster-prio-dynamic-map

A reference repository for implementing Dynamic Mapping and Op Prioritization with Dagster.

Language: Python - Size: 65.4 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

splitgraph/seafowl-dagster-demo

An example project demonstrating how to submit data to Seafowl from a dagster job.

Language: Python - Size: 9.77 KB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

bytes1inger/Beatlytica

This project implements a real-time event streaming pipeline for a music streaming service, inspired by Spotify Wrapped and Billboard charts. The pipeline is powered by Apache Airflow, Apache Kafka, dbt, Docker, GCP, Spark-Streaming, and Terraform.

Language: Python - Size: 86.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Fozan-Talat/DTC-Data-Engineering-Zoomcamp

This repository contains homework solutions and course material for 10 weeks data engineering zoomcamp by DataTalksClub.

Language: Jupyter Notebook - Size: 39.1 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

ZuchniakK/CryptoDataProcessing

In the series of notebooks included in this repository, I present the process of acquiring, exploring, cleaning and normalizing data, generating additional features, and creating a dataset containing reasonable X and Y batches for ML.

Language: Jupyter Notebook - Size: 4.4 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

eldor-fozilov/first-dance-with-MLOps

Knowing how to deploy models into production is as important as building them!

Language: Jupyter Notebook - Size: 73.2 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 1

panastasiadis/etl-microservices-demo

This repository contains a demo application that showcases the ETL (extract, transform, load) process using Apache Kafka, MongoDB, MySQL, and Neo4j to collect, store, and analyze product transactions data.

Language: Python - Size: 6.84 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

mpolinowski/apache-airflow-intro

Introduction to Apache Airflow

Language: Python - Size: 9.77 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0