An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: data-engineering-pipeline

imverma/CineETL_Movie_Insights_Data_Pipeline

A data pipeline that conducts ETL processes to AWS Redshift, utilizing Spark and coordinated by Apache Airflow.

Language: Python - Size: 20.5 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

HelloSongi/Spotify-Data-Pipeline

Routinely collects trending songs world and stores them to a storage pool. Utilizes various Microsoft Azure services(ADLS, ADF, Synapse Analysis, Azure Functions, Logic Apps), Spotify API

Language: Jupyter Notebook - Size: 374 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 1

lgthevinh/ev-stock-etl-pipeline

An ETL pipeline that extracts, transforms, and loads data from various sources related to electric vehicle (EV) stocks.

Language: Python - Size: 14.6 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

sarutlaa/Ride-Hailing-Data-Analytics

An end to end Data Engineering Project

Language: Jupyter Notebook - Size: 279 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

borbert/Data_Engineering_Nanodegree

This repository is the collection point for all of the projects completed during the Udacity Data Engineering Nano Degree program.

Language: Jupyter Notebook - Size: 44.8 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 1

NatanDuarte/sega_games_pipeline

Experimenting with Data Pipelines in Python

Language: Python - Size: 6.84 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

datarootsio/notion-dbs-data-quality

Using Great Expectations and Notion's API, this repo aims to provide data quality for our databases in Notion.

Language: Python - Size: 56.3 MB - Last synced at: 3 days ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 0

JiajunSong629/Quick_OCR_with_AWS_Lambda

A quick implementation of OCR Application with AWS Lambda.

Language: Python - Size: 6.84 KB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

Savimbi/etl-batchprocess

Data ingestion solution using spring batch and postgreSQL as data warehouse.

Language: Java - Size: 177 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 2

zarexalvindaria/data-engineering

This repo contains the Data Engineering exercises I took in Datacamp.

Language: Python - Size: 76.7 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

dhvani-k/NYC_311_Service_Insights

NYC-311 Service Insights: A data-driven analysis of NYC's non-emergency service requests from 2010 to 2023

Language: Python - Size: 89.8 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

dhvani-k/CineETL_Movie_Insights_Data_Pipeline

A data pipeline that conducts ETL processes to AWS Redshift, utilizing Spark and coordinated by Apache Airflow.

Language: Python - Size: 20.5 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

andersonesanto/igti-edd-m5-desafio

IGTI Enhenheiro de Dados - Módulo 5 Desafio Final

Language: Jupyter Notebook - Size: 38.1 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

ketgo/marshmallow-pyspark

Marshmallow serializer integration with pyspark

Language: Python - Size: 63.5 KB - Last synced at: 29 days ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 4

leosimoes/DataScienceAcademy-EngenhariaDeDados-Fundamentos

Atividades do curso "Fundamentos de Engenharia de Dados" da DataScienceAcademy.

Language: Python - Size: 763 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

waqarg2001/Formula1-Insights-DE

Formula 1 race data engineering project which utilises azure services and databricks to ingest and analyse the data.

Language: Python - Size: 2.92 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

AntonioLunardi/Kubernetes_Celery_Airflow_for_stocks_and_cryptocurrencies

Celery and Kubernetes operators are used in order to manage data engineering pipelines of stocks and cryptocurrencies prices

Language: Python - Size: 205 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

immu0001/Udacity-Data-Engineer-nanodegree

Classwork projects and home works done through Udacity data engineering nano degree

Language: Jupyter Notebook - Size: 101 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 64 - Forks: 71

anna-geller/prefect-getting-started

Get started with Prefect by scheduling your Prefect flows with GitHub Actions

Language: Python - Size: 22.5 KB - Last synced at: 12 days ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 1

UmairThakur/Uber-Data-Analysis-ETL-PIPELINE-DATA-ANALYSIS_PROJECT

Uber Data Analysis Project, an End-to-End Data Engineering Project from creating data pipelines to finally creating the dashboard.

Language: Jupyter Notebook - Size: 19 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

CharlieSergeant/airflow-minio-postgres-fastapi

Sample data store project to be hosted on a remote server or cluster. CICD using GitHub actions for SSH Deploy to remote server for docker compose.

Language: Python - Size: 26.4 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

kaoutaar/velib_v1

end to end data engineering project

Language: Python - Size: 510 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 0

Eugeme/salesforce-azure-backup

Backup of all sObject records from Salesforce into Azure SQL database, using Python and SOQL.

Language: Python - Size: 612 KB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

koksang/social-media-analysis

Social Media Analysis, scalable solution, flexible deployment that analyses social media contents

Language: Jupyter Notebook - Size: 5.21 MB - Last synced at: 11 months ago - Pushed at: almost 2 years ago - Stars: 10 - Forks: 1

dylanzenner/greenhouse_gas_emissions_de_pipeline

Data Engineering pipeline hosted entirely in the AWS ecosystem utilizing DynamoDB as the database

Language: Python - Size: 2.43 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 0

dylanzenner/business_closures_de_pipeline

Data Engineering pipeline hosted entirely in the AWS ecosystem utilizing DocumentDB as the database

Language: Python - Size: 3.95 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 13 - Forks: 6

data-engineering-team4/kpop_dashboard

Spotify API를 이용한 K-POP 인기 탐색 분석 대시보드

Language: Python - Size: 169 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 5

rivas-j/Big_Data_Marketing_Analysis-AWS-Spark-SQL

Build Data Pipeline with pgAdmin, AWS Cloud and Apache Spark to Analyze and Determine Bias in Amazon Vine Reviews

Language: Jupyter Notebook - Size: 305 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 1

bramvdklinkenberg/adf-airflow-data-project

Data engineering project using Azure Data Factory and Apache Airflow

Language: Python - Size: 45.9 KB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

anna-geller/dataflow-ops-aws-eks

Project demonstrating how to automate Prefect 2.0 deployments to AWS EKS

Language: Python - Size: 78.1 KB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 8 - Forks: 1

jmdatasci/user-behavior-spark-pipeline

Streaming Data Pipeline ETL with PySpark, Hadoop, Docker-Compose, Kafka and Redis

Language: Python - Size: 25.4 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

VeraZab/nyc-stats

Analysis of 311 Service Requests for the City of NYC (from 2010 to 2023) Tech: Prefect cloud, dbt core, BigQuery, Compute Engine, CloudRun, Artifact Registry, Terraform, Docker

Language: Python - Size: 4.88 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 14 - Forks: 3

pyprogrammerblog/tiny-blocks

Tiny Blocks to build large and complex data pipelines!

Language: Python - Size: 70.8 MB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

siddharth271101/Covid-19-and-Aviation-Industry

The goal of this project is to analyse the impact of Covid-19 on the Aviation industry through data engineering processes using technologies such as Apache Airflow, Apache Spark, Tableau and couple of AWS services

Language: Python - Size: 15.4 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 8 - Forks: 3

tmaferreira/DataEngineeringZoomCampProject

Data Engineering ZoomCamp Course Project

Language: Python - Size: 143 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 1

vspatil/citibike-data-pipeline

Analysis of NYC's citibike data. Technologies: Python , Prefect, dbt, Terraform , Looker data studio

Language: Python - Size: 130 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Alero-Awani/Batch-data-engineering-project

A batch Data Pipeline that retrieves data from a user purchase table and a movie review table and is transformed to form a user behaviour metric table.

Language: HCL - Size: 727 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 11 - Forks: 0

AntonioLunardi/Weather-and-diesese-data-frames-cleaning-for-public-health-analysis

Two data frames of different kaggle cases of diesease cases and weather in Brazil. The project aims to clean the DFs and build a new one in order to analyse the correlation of dengue (serious disease transmited by mosquito), rain precipitation and temperature.

Size: 7.24 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

amva13/monofeed

cryptocurrency ticker data pipeline

Language: Python - Size: 688 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

jackmulligan-ire/ppr-pipeline

Irish Property Price Register transformed into a data warehouse via an EtLT pipeline.

Language: TypeScript - Size: 22.7 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

BetoAvila/crypto_visualizer

Crypto Visualizer project is an end-to-end application to ingest, process and monitor crypto prices stream in real-time.

Language: Python - Size: 1.13 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

quocdeptraibodoi19/Data-Pipeline-using-Airflow

This project is to create a data pipeline automated by Apache Airflow using Twitter API

Language: Python - Size: 14.6 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

sayantikabanik/presentations_conferences

Presentations/tutorials delivered by me at various conferences 👩🏽‍💻

Size: 10.9 MB - Last synced at: 5 months ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

ShihWen/tpe-mrt-traffic-etl

A data pipeline from source to data warehouse using Taipei Metro Hourly Traffic data

Language: Jupyter Notebook - Size: 17.5 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 0

DeleLinus/HFR-Data-Warehousing

End-to-end data engineering processes for the NIGERIA Health Facility Registry (HFR). The project leveraged Selenium, Pandas, PySpark, PostgreSQL and Airflow

Language: Python - Size: 1.05 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 0

org-not-included/simple_analytics_pipeline

Python example using Pandas to load CSV into a local SQLite DB.

Language: Python - Size: 4.62 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

dvu4/udacity-data-engineering

Data Engineering Projects including Data Modeling, Data Warehouse, Data Lake Development

Language: Jupyter Notebook - Size: 2.09 MB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 2 - Forks: 2

sanjeevai/disaster-response-pipeline

ETL pipeline combined with supervised learning and grid search to classify text messages sent during a disaster event

Language: Python - Size: 73.9 MB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 16 - Forks: 12

desanti/airflow-examples

Pipelines de Airflow - códigos de exemplo

Language: Python - Size: 17.6 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

ZahidGalea/data-engineering-in-gcp-challenge Fork of walmartdigital/de-challenge

Language: Python - Size: 837 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

san089/data-engineer-roadmap Fork of boringPpl/data-engineer-roadmap

Learning from multiple companies in Silicon Valley. Netflix, Facebook, Google, Startups

Size: 213 KB - Last synced at: over 2 years ago - Pushed at: almost 7 years ago - Stars: 12 - Forks: 7

hxwwong/DEBC1-Sprint3-NER-OSM-Airflow-Pipeline

A New Entity Recognition and OSM Data Pipeline hosted locally on Apache Airflow and a Docker container. Made with Phoemela Ballaran as the final output of Sprint 3 of the Data Engineering Bootcamp

Language: Python - Size: 29 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 2

Mark-McAdam/Data-Engineering-Batch

Takes product reviews and performs natural language processing to provide sentiment analysis. The new insight gets combined with matching product information in the central database to provide a clearer picture of user behavior.

Language: Python - Size: 963 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

Mcamin/Disaster-Response-Pipeline

ETL Pipeline / ML Pipeline of Disaster Data provided by figure8

Language: Jupyter Notebook - Size: 14.7 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 6

jimitmistry/Stock-Market-Prediction-with-LSTM-and-Data-Pipeline

Language: Jupyter Notebook - Size: 229 KB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 0

marianajo/beam-examples

Examples that I use to learn and show Apache Beam

Language: Python - Size: 14.6 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 0

chrisammon3000/aws-permits-pipeline

ETL pipeline for construction permits data in Los Angeles built on AWS S3, Lambda and RDS PostgreSQL.

Language: Python - Size: 149 KB - Last synced at: 15 days ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 0

chetnachaudhari/DockerisedKafkaToPostgresPipeline

A dockerised application to ETL data from Kafka to Postgres

Language: Python - Size: 10.7 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

Faisal-AlDhuwayhi/Disaster-Response-Pipeline

Building Machine Learning and ETL Pipelines to categorize emergency messages based on the needs communicated by the sender

Language: Python - Size: 4.41 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0