GitHub topics: data-engineering-pipeline
imverma/CineETL_Movie_Insights_Data_Pipeline
A data pipeline that conducts ETL processes to AWS Redshift, utilizing Spark and coordinated by Apache Airflow.
Language: Python - Size: 20.5 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

HelloSongi/Spotify-Data-Pipeline
Routinely collects trending songs world and stores them to a storage pool. Utilizes various Microsoft Azure services(ADLS, ADF, Synapse Analysis, Azure Functions, Logic Apps), Spotify API
Language: Jupyter Notebook - Size: 374 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 1

lgthevinh/ev-stock-etl-pipeline
An ETL pipeline that extracts, transforms, and loads data from various sources related to electric vehicle (EV) stocks.
Language: Python - Size: 14.6 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

sarutlaa/Ride-Hailing-Data-Analytics
An end to end Data Engineering Project
Language: Jupyter Notebook - Size: 279 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

borbert/Data_Engineering_Nanodegree
This repository is the collection point for all of the projects completed during the Udacity Data Engineering Nano Degree program.
Language: Jupyter Notebook - Size: 44.8 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 1

NatanDuarte/sega_games_pipeline
Experimenting with Data Pipelines in Python
Language: Python - Size: 6.84 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

datarootsio/notion-dbs-data-quality
Using Great Expectations and Notion's API, this repo aims to provide data quality for our databases in Notion.
Language: Python - Size: 56.3 MB - Last synced at: 3 days ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 0

JiajunSong629/Quick_OCR_with_AWS_Lambda
A quick implementation of OCR Application with AWS Lambda.
Language: Python - Size: 6.84 KB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

Savimbi/etl-batchprocess
Data ingestion solution using spring batch and postgreSQL as data warehouse.
Language: Java - Size: 177 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 2

zarexalvindaria/data-engineering
This repo contains the Data Engineering exercises I took in Datacamp.
Language: Python - Size: 76.7 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

dhvani-k/NYC_311_Service_Insights
NYC-311 Service Insights: A data-driven analysis of NYC's non-emergency service requests from 2010 to 2023
Language: Python - Size: 89.8 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

dhvani-k/CineETL_Movie_Insights_Data_Pipeline
A data pipeline that conducts ETL processes to AWS Redshift, utilizing Spark and coordinated by Apache Airflow.
Language: Python - Size: 20.5 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

andersonesanto/igti-edd-m5-desafio
IGTI Enhenheiro de Dados - Módulo 5 Desafio Final
Language: Jupyter Notebook - Size: 38.1 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

ketgo/marshmallow-pyspark
Marshmallow serializer integration with pyspark
Language: Python - Size: 63.5 KB - Last synced at: 29 days ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 4

leosimoes/DataScienceAcademy-EngenhariaDeDados-Fundamentos
Atividades do curso "Fundamentos de Engenharia de Dados" da DataScienceAcademy.
Language: Python - Size: 763 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

waqarg2001/Formula1-Insights-DE
Formula 1 race data engineering project which utilises azure services and databricks to ingest and analyse the data.
Language: Python - Size: 2.92 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

AntonioLunardi/Kubernetes_Celery_Airflow_for_stocks_and_cryptocurrencies
Celery and Kubernetes operators are used in order to manage data engineering pipelines of stocks and cryptocurrencies prices
Language: Python - Size: 205 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

immu0001/Udacity-Data-Engineer-nanodegree
Classwork projects and home works done through Udacity data engineering nano degree
Language: Jupyter Notebook - Size: 101 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 64 - Forks: 71

anna-geller/prefect-getting-started
Get started with Prefect by scheduling your Prefect flows with GitHub Actions
Language: Python - Size: 22.5 KB - Last synced at: 12 days ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 1

UmairThakur/Uber-Data-Analysis-ETL-PIPELINE-DATA-ANALYSIS_PROJECT
Uber Data Analysis Project, an End-to-End Data Engineering Project from creating data pipelines to finally creating the dashboard.
Language: Jupyter Notebook - Size: 19 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

CharlieSergeant/airflow-minio-postgres-fastapi
Sample data store project to be hosted on a remote server or cluster. CICD using GitHub actions for SSH Deploy to remote server for docker compose.
Language: Python - Size: 26.4 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

kaoutaar/velib_v1
end to end data engineering project
Language: Python - Size: 510 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 0

Eugeme/salesforce-azure-backup
Backup of all sObject records from Salesforce into Azure SQL database, using Python and SOQL.
Language: Python - Size: 612 KB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

koksang/social-media-analysis
Social Media Analysis, scalable solution, flexible deployment that analyses social media contents
Language: Jupyter Notebook - Size: 5.21 MB - Last synced at: 11 months ago - Pushed at: almost 2 years ago - Stars: 10 - Forks: 1

dylanzenner/greenhouse_gas_emissions_de_pipeline
Data Engineering pipeline hosted entirely in the AWS ecosystem utilizing DynamoDB as the database
Language: Python - Size: 2.43 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 0

dylanzenner/business_closures_de_pipeline
Data Engineering pipeline hosted entirely in the AWS ecosystem utilizing DocumentDB as the database
Language: Python - Size: 3.95 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 13 - Forks: 6

data-engineering-team4/kpop_dashboard
Spotify API를 이용한 K-POP 인기 탐색 분석 대시보드
Language: Python - Size: 169 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 5

rivas-j/Big_Data_Marketing_Analysis-AWS-Spark-SQL
Build Data Pipeline with pgAdmin, AWS Cloud and Apache Spark to Analyze and Determine Bias in Amazon Vine Reviews
Language: Jupyter Notebook - Size: 305 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 1

bramvdklinkenberg/adf-airflow-data-project
Data engineering project using Azure Data Factory and Apache Airflow
Language: Python - Size: 45.9 KB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

anna-geller/dataflow-ops-aws-eks
Project demonstrating how to automate Prefect 2.0 deployments to AWS EKS
Language: Python - Size: 78.1 KB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 8 - Forks: 1

jmdatasci/user-behavior-spark-pipeline
Streaming Data Pipeline ETL with PySpark, Hadoop, Docker-Compose, Kafka and Redis
Language: Python - Size: 25.4 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

VeraZab/nyc-stats
Analysis of 311 Service Requests for the City of NYC (from 2010 to 2023) Tech: Prefect cloud, dbt core, BigQuery, Compute Engine, CloudRun, Artifact Registry, Terraform, Docker
Language: Python - Size: 4.88 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 14 - Forks: 3

pyprogrammerblog/tiny-blocks
Tiny Blocks to build large and complex data pipelines!
Language: Python - Size: 70.8 MB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

siddharth271101/Covid-19-and-Aviation-Industry
The goal of this project is to analyse the impact of Covid-19 on the Aviation industry through data engineering processes using technologies such as Apache Airflow, Apache Spark, Tableau and couple of AWS services
Language: Python - Size: 15.4 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 8 - Forks: 3

tmaferreira/DataEngineeringZoomCampProject
Data Engineering ZoomCamp Course Project
Language: Python - Size: 143 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 1

vspatil/citibike-data-pipeline
Analysis of NYC's citibike data. Technologies: Python , Prefect, dbt, Terraform , Looker data studio
Language: Python - Size: 130 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Alero-Awani/Batch-data-engineering-project
A batch Data Pipeline that retrieves data from a user purchase table and a movie review table and is transformed to form a user behaviour metric table.
Language: HCL - Size: 727 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 11 - Forks: 0

AntonioLunardi/Weather-and-diesese-data-frames-cleaning-for-public-health-analysis
Two data frames of different kaggle cases of diesease cases and weather in Brazil. The project aims to clean the DFs and build a new one in order to analyse the correlation of dengue (serious disease transmited by mosquito), rain precipitation and temperature.
Size: 7.24 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

amva13/monofeed
cryptocurrency ticker data pipeline
Language: Python - Size: 688 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

jackmulligan-ire/ppr-pipeline
Irish Property Price Register transformed into a data warehouse via an EtLT pipeline.
Language: TypeScript - Size: 22.7 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

BetoAvila/crypto_visualizer
Crypto Visualizer project is an end-to-end application to ingest, process and monitor crypto prices stream in real-time.
Language: Python - Size: 1.13 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

quocdeptraibodoi19/Data-Pipeline-using-Airflow
This project is to create a data pipeline automated by Apache Airflow using Twitter API
Language: Python - Size: 14.6 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

sayantikabanik/presentations_conferences
Presentations/tutorials delivered by me at various conferences 👩🏽💻
Size: 10.9 MB - Last synced at: 5 months ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

ShihWen/tpe-mrt-traffic-etl
A data pipeline from source to data warehouse using Taipei Metro Hourly Traffic data
Language: Jupyter Notebook - Size: 17.5 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 0

DeleLinus/HFR-Data-Warehousing
End-to-end data engineering processes for the NIGERIA Health Facility Registry (HFR). The project leveraged Selenium, Pandas, PySpark, PostgreSQL and Airflow
Language: Python - Size: 1.05 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 0

org-not-included/simple_analytics_pipeline
Python example using Pandas to load CSV into a local SQLite DB.
Language: Python - Size: 4.62 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

dvu4/udacity-data-engineering
Data Engineering Projects including Data Modeling, Data Warehouse, Data Lake Development
Language: Jupyter Notebook - Size: 2.09 MB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 2 - Forks: 2

sanjeevai/disaster-response-pipeline
ETL pipeline combined with supervised learning and grid search to classify text messages sent during a disaster event
Language: Python - Size: 73.9 MB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 16 - Forks: 12

desanti/airflow-examples
Pipelines de Airflow - códigos de exemplo
Language: Python - Size: 17.6 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

ZahidGalea/data-engineering-in-gcp-challenge Fork of walmartdigital/de-challenge
Language: Python - Size: 837 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

san089/data-engineer-roadmap Fork of boringPpl/data-engineer-roadmap
Learning from multiple companies in Silicon Valley. Netflix, Facebook, Google, Startups
Size: 213 KB - Last synced at: over 2 years ago - Pushed at: almost 7 years ago - Stars: 12 - Forks: 7

hxwwong/DEBC1-Sprint3-NER-OSM-Airflow-Pipeline
A New Entity Recognition and OSM Data Pipeline hosted locally on Apache Airflow and a Docker container. Made with Phoemela Ballaran as the final output of Sprint 3 of the Data Engineering Bootcamp
Language: Python - Size: 29 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 2

Mark-McAdam/Data-Engineering-Batch
Takes product reviews and performs natural language processing to provide sentiment analysis. The new insight gets combined with matching product information in the central database to provide a clearer picture of user behavior.
Language: Python - Size: 963 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

Mcamin/Disaster-Response-Pipeline
ETL Pipeline / ML Pipeline of Disaster Data provided by figure8
Language: Jupyter Notebook - Size: 14.7 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 6

jimitmistry/Stock-Market-Prediction-with-LSTM-and-Data-Pipeline
Language: Jupyter Notebook - Size: 229 KB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 0

marianajo/beam-examples
Examples that I use to learn and show Apache Beam
Language: Python - Size: 14.6 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 0

chrisammon3000/aws-permits-pipeline
ETL pipeline for construction permits data in Los Angeles built on AWS S3, Lambda and RDS PostgreSQL.
Language: Python - Size: 149 KB - Last synced at: 15 days ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 0

chetnachaudhari/DockerisedKafkaToPostgresPipeline
A dockerised application to ETL data from Kafka to Postgres
Language: Python - Size: 10.7 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

Faisal-AlDhuwayhi/Disaster-Response-Pipeline
Building Machine Learning and ETL Pipelines to categorize emergency messages based on the needs communicated by the sender
Language: Python - Size: 4.41 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0
