GitHub topics: data-engineering-pipeline
fazeelibtesam/Scraper
Python script for web scrapping
Language: Jupyter Notebook - Size: 5.86 KB - Last synced at: 41 minutes ago - Pushed at: 44 minutes ago - Stars: 0 - Forks: 0

nishthapant/airflow
This project orchestrates an end-to-end data pipeline for an e-commerce dataset using Apache Airflow (in Docker) and a separate dbt (data build tool) project. The pipeline transforms raw source data into structured, analytics-ready datasets.
Language: Python - Size: 1.13 MB - Last synced at: about 5 hours ago - Pushed at: about 6 hours ago - Stars: 0 - Forks: 0

bitoollearner/de-project-BI-Learner
This Repository is dedicated for data engineering projects
Size: 12.7 KB - Last synced at: about 24 hours ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

higorcazuza81/higorcazuza81
A little about me
Size: 3.56 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

h-sutiwas/data-engineering-zoomcamp
This repository contains materials and in-class projects from all lesson in Data Engineering Zoomcamp 2025 by DataTalks.Club
Language: Jupyter Notebook - Size: 51.2 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

billy-moore-98/predictit
Batch ingestion pipeline for Predictit market data
Language: Python - Size: 138 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

carlosgdoss/Data-Professional-Survey
This Power BI dashboard analyzes survey responses from data professionals, covering key aspects such as salary distribution, job satisfaction, and preferred programming languages. The insights help understand trends in the data industry and what matters most to professionals.
Size: 303 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

Omio-saha/Spotify_Data_Pipe_Snowflake
It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS into snowflake datawarehouse. It utilizes AWS services such as Lambda, S3, and CloudWatch to orchestrate the process. The transformed data is then loaded into Snowflake using Snowpipe, and finally visualized in Power BI.
Size: 1000 Bytes - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

zaw-may/Fabric-Medallion-Architecture
Language: Jupyter Notebook - Size: 7.81 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

abhinav-pandey29/eats-data-miner
An end-to-end data pipeline project for DoorDash expense tracking
Language: Python - Size: 198 KB - Last synced at: 4 days ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

KarthikMahalingam8881/Amazon-Fake-Review-Detection-Pipeline
Amazon Fake Review Detection Pipeline
Language: Python - Size: 55.7 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

anna-geller/prefect-deployment-patterns
Code examples showing flow deployment to various types of infrastructure
Language: Python - Size: 249 KB - Last synced at: 1 day ago - Pushed at: over 2 years ago - Stars: 107 - Forks: 10

panthers-labs-pvt-ltd/progressive.mind.framework
Progressive Mind Framework is a metadata-driven data engineering framework that simplifies and automates the design and execution of data pipelines. It helps you build scalable, governed, and observable pipelines using metadata configurations instead of hand-written transformation logic.
Language: Scala - Size: 202 MB - Last synced at: 20 days ago - Pushed at: 21 days ago - Stars: 1 - Forks: 0

DarkStarStrix/DataVolt
Reusable data engineering toolkit My personal data infrastructure
Language: Jupyter Notebook - Size: 13.9 MB - Last synced at: 20 days ago - Pushed at: 21 days ago - Stars: 17 - Forks: 2

vmware/versatile-data-kit
One framework to develop, deploy and operate data workflows with Python and SQL.
Language: Python - Size: 110 MB - Last synced at: about 13 hours ago - Pushed at: 5 days ago - Stars: 450 - Forks: 59

san089/Udacity-Data-Engineering-Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Language: Python - Size: 2.03 MB - Last synced at: 25 days ago - Pushed at: almost 3 years ago - Stars: 1,620 - Forks: 527

alwi2404/ETL-Pipeline-for-Region-Segmentation-and-Product-Performance-Analysis
An ETL Project using SQL Server Integration Services (SSIS) for Region Segmentation and Sales Performance Analysis with real-world data pipelines and business insights.
Size: 24.4 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

kkrusere/NHANES-pyTOOL-API
The NHANES Data 'API' is a Python tool that simplifies access to the National Health and Nutrition Examination Survey (NHANES) dataset. This project provides an easy-to-use API to retrieve NHANES data, helping researchers, data scientists, health professionals, and other stakeholders access these valuable datasets.
Language: Python - Size: 215 KB - Last synced at: 1 day ago - Pushed at: 4 months ago - Stars: 9 - Forks: 5

Wb-az/MLib-PySpark-SoundLevel-Prediction
Creates a ML Pipeline leveraging PySpark SQL and PySpark MLib to predict sound level
Language: Jupyter Notebook - Size: 972 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

san089/goodreads_etl_pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Language: Python - Size: 1.31 MB - Last synced at: about 1 month ago - Pushed at: over 5 years ago - Stars: 1,378 - Forks: 227

husskhosravi/cricket-analytics-snowflake-pipeline
End-to-end Snowflake data pipeline for cricket analytics using JSON data from AWS S3, automated ingestion, transformation, and modelling into a scalable star schema
Size: 122 KB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

JBris/dagster-dbt-openmetadata-docker
Docker deployment of Dagster, DBT, and OpenMetadata
Language: Python - Size: 2.03 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 2 - Forks: 0

BayoAdejare/lightning-containers
Docker powered starter for geospatial analysis of lightning atmospheric data.
Language: Python - Size: 159 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 6 - Forks: 2

Data-Projects-AGN/Weather-Data-ETL-using-Kafka
Simulated real-time weather data pipeline using Python, Apache Kafka (multi-node), and PostgreSQL. Weather metrics are published to country-wise Kafka topics and stored in a data warehouse for downstream ETL and analytics.
Language: Python - Size: 33.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

indranilekkala/movie_data_pipeline
An end-to-end data pipeline that processes movie data from TMDb API, stores it in PostgreSQL, and visualizes trends using Metabase.
Language: Python - Size: 17.6 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

KhotChaitanya/Customer_Segmentation_ETL_SSIS
An ETL Project using SQL Server Integration Services (SSIS) for Customer Segmentation and Sales Performance Analysis with real-world data pipelines and business insights.
Size: 29.3 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

adark-d/smart-rental-pricing
Repository for the smart rental and recommendation system for listings in Ghana project
Language: HTML - Size: 2.44 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

zaw-may/Fabric-UserDataFunctions-ETL
Using Fabric User Data Functions Within A Data Pipeline
Language: Python - Size: 12.7 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

zaw-may/Fabric-OnPremises-ETL
Moving On-premises Data into Microsoft Fabric Data Stores
Language: TSQL - Size: 24.4 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Danitilahun/AWS-Data-Engineering-project
In this AWS Data Engineering project, we delve into the intricacies of building a robust real-time data pipeline using DynamoDB, Snowflake, and AWS Lambda.
Language: Python - Size: 3.91 KB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

zaw-may/Fabric-OneLake-ETL
Crafting data solution with Fabric Analytics Pipeline
Language: TSQL - Size: 339 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

AhmetFurkanDEMIR/Data-Engineering-Project-with-HDFS-and-Kafka
Data Engineering Project with Hadoop HDFS and Kafka
Language: Python - Size: 3.46 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 102 - Forks: 25

longNguyen010203/Youtube-Recommend-Master-ETL-Pipeline
A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker. Data from kaggle and youtube-api
Language: Jupyter Notebook - Size: 701 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 21 - Forks: 2

protonic/protonic.github.io
This is going to be my homepage and my profile page
Language: HTML - Size: 9.77 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

eveliinahampus/openweather-datapipe
Data Engineering Project: ETL pipeline to fetch data from OpenWeather API with batch processing, tidy and transform data, load it to PostgreSQL database -- scheduled with Airflow.
Language: Jupyter Notebook - Size: 1.95 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

mananb77/data101-postgres-spark
Compare the efficiencies of Postgres and Apache Spark.
Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

kishlayjeet/Stock-Market-Real-Time-Data-Pipeline-with-Apache-Kafka-and-Cassandra
A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apache Kafka and stored in a local Cassandra database.
Language: Python - Size: 2.3 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 25 - Forks: 7

aiden-liu/aiden-liu.github.io
Blog space on data engineering, machine learning, platform engineering.
Size: 90.8 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

tahir007malik/adventureWorksDataAnalytics
This repository showcases an end-to-end ETL pipeline leveraging Azure services, including ADF, ADLS Gen2, Databricks, and Synapse Analytics, to enhance data processing efficiency.
Language: Jupyter Notebook - Size: 3.5 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 1

fonsecagabriella/data_engineering
Building end-to-end data pipelines
Language: Jupyter Notebook - Size: 331 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 2

fonsecagabriella/carbonlens
A Climate and Social Indicators Data Pipeline 🌎 🌱📊 | data engineering
Language: Python - Size: 35.2 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

hq969/Youtube-Data-Pipeline-AWS
About Leveraging AWS Cloud Services, an ETL pipeline transforms YouTube video statistics data. Data is downloaded from Kaggle, uploaded to an S3 bucket, and cataloged using AWS Glue for querying with Athena. AWS Lambda and Glue converts to Parquet format and stores it in a cleansed S3 bucket. AWS QuickSight then visualizes the materialised data.
Language: Python - Size: 1.69 MB - Last synced at: 6 days ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

zack0061/End-to-End-Data-Pipeline
📈 A scalable, production-ready data pipeline for real-time streaming & batch processing, integrating Kafka, Spark, Airflow, AWS, Kubernetes, and MLflow. Supports end-to-end data ingestion, transformation, storage, monitoring, and AI/ML serving with CI/CD automation using Terraform & GitHub Actions.
Language: Python - Size: 2.6 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

InosRahul/f1-data-pipeline
F1 Data Pipeline
Language: Python - Size: 401 KB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 23 - Forks: 4

mikeroyal/Apache-Spark-Guide
Apache Spark Guide
Language: Python - Size: 237 KB - Last synced at: 4 days ago - Pushed at: over 3 years ago - Stars: 31 - Forks: 11

yash-chauhan-dev/SPARK_CLUSTER_DOCKER
Set-up local spark cluster, hadoop (hdfs), airflow, postgresql on docker with ease, without any local installations
Language: Dockerfile - Size: 1.3 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

ShubhamMohanty680/Spotify_Snowflake
It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS into snowflake datawarehouse. It utilizes AWS services such as Lambda, S3, and CloudWatch to orchestrate the process. The transformed data is then loaded into Snowflake using Snowpipe, and finally visualized in Power BI.
Language: Python - Size: 1.79 MB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 1

tahir007malik/fintechDataMigration
This repository showcases scalable data pipeline designed for migrating and transforming a fintech company’s data from traditional SQL databases to Azure Data Lake.
Language: Jupyter Notebook - Size: 271 KB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Hamzah023/LabattDataPipeline
This is a data pipeline that represents alcohol consumption per country made to analyze sales predictions for Labatt
Language: Python - Size: 33.7 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 4 - Forks: 0

ShubhamMohanty680/Spotify_end_to_end_data_engineering
It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS.
Language: Jupyter Notebook - Size: 1.44 MB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

anki-code/xontrib-pipeliner
Let your pipe lines flow thru the Python code in xonsh.
Language: Python - Size: 149 KB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 59 - Forks: 4

sanketrs/implementation-of-modern-data-engineering-architecture-with-fabric_analytics
Building a next-generation hybrid data pipeline architecture that combines the power of Microsoft Fabric, Azure Cloud, and Power BI. This pipeline is engineered to tackle the challenges of real-time data ingestion, multi-layered processing, and analytics, delivering business-critical insights.
Language: Python - Size: 32.2 KB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

elmezianech/AutoInventory
This project is an end-to-end, fully automated warehouse management solution designed to tackle real-world inventory challenges in the FMCG sector. From real-time data ingestion and predictive analytics to interactive dashboards, this project combines cutting-edge technologies and an event-driven architecture to simulate a business-ready system.
Language: Python - Size: 61.5 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

benedekrozemberczki/AV_Ultimate_Student_Hunt
Solution for the Ultimate Student Hunt Challenge (1st place).
Language: R - Size: 43 KB - Last synced at: 3 days ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 9

kishlayjeet/Twitter-Data-Pipeline-using-Airflow-and-AWS-S3
An end-to-end Twitter Data Pipeline that extracts data from Twitter and loads it into AWS S3.
Language: Python - Size: 20.5 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 13 - Forks: 6

iTrauco/pybro
yo, it's ya boy, pybro! 😎 | a personal collection of python hacks for 24.04 debian
Language: Python - Size: 35.2 KB - Last synced at: 4 days ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

Semiu/data-engineering
Introduction to Data Engineering
Language: Jupyter Notebook - Size: 109 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

alanchn31/Movalytics-Data-Warehouse
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
Language: Python - Size: 717 KB - Last synced at: 7 months ago - Pushed at: about 5 years ago - Stars: 133 - Forks: 31

anna-geller/prefect-aws-lambda
Deploy a Prefect flow to serverless AWS Lambda function
Language: Python - Size: 19.5 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 36 - Forks: 6

anna-geller/dataflow-ops
Project demonstrating how to automate Prefect 2.0 deployments to AWS ECS Fargate
Language: Python - Size: 1.32 MB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 113 - Forks: 24

BayoAdejare/pipeline-ecommerce
E-commerce Data Pipeline
Language: Python - Size: 22.5 MB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

BayoAdejare/pipeline-edtech
Edtech ADF Pipeline Project
Size: 12.7 KB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

BayoAdejare/pipeline-sleep
Sleep Data Pipeline with Azure Data Factory
Size: 21.5 KB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

JessicaHora/JessicaHora
Size: 9.17 MB - Last synced at: 12 days ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

anastasiamkh/aws-dataflow-simulator
Python package that simplifies the creation of AWS infrastructure for simulating real-time data streaming and batch processing, ideal for integrating into machine learning projects.
Language: Python - Size: 2.68 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

MeganAlbee/TheWedge
This repository will showcase extracting, cleaning and uploading files onto Google Bigquery, based on a project for the MSBA program at the University of Montana.
Language: Jupyter Notebook - Size: 543 KB - Last synced at: 11 months ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 1

kplofts/datasolve_dw_sql
create a datawarehouse in sql without external tools - conceptualized
Language: Python - Size: 140 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

jacquesbilombe/Payment-Patterns-Brazil
"Payment Patterns Brazil" is a data pipeline project leveraging cloud technologies to analyze consumer trends in Brazilian payment methods. This project involves data collection, modeling, loading, and analysis using Google Console and other cloud platforms. Explore insights on payment behaviors and trends across Brazil.
Language: Jupyter Notebook - Size: 2.9 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

gear5sh/Gear5
high performance better alternative to Airbyte, Singer, Meltano
Language: Go - Size: 30.6 MB - Last synced at: 11 months ago - Pushed at: about 1 year ago - Stars: 16 - Forks: 4

povoaaires/data_project_model
Data pipeline model repository
Language: Python - Size: 1.95 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

DivineSamOfficial/Banking-Data-Warehouse-Pipeline
Banking Data Warehouse Pipeline
Language: Python - Size: 52.1 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

PATRICIAJUNQUEIRA/Airflow_Pipeline_Gera_Pasta
Pipeline de dados automatizado para extração e armazenamento de previsões meteorológicas para o setor de turismo.
Language: Python - Size: 104 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

umairkarel/Amazon-Sales-Data-Engineering
Data Engineering Pipeline practice with Amazon Sales Data
Language: Python - Size: 6.35 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

waqarg2001/Youtube-Data-Pipeline-AWS
Leveraging AWS Cloud Services, an ETL pipeline transforms YouTube video statistics data. Data is downloaded from Kaggle, uploaded to an S3 bucket, and cataloged using AWS Glue for querying with Athena. AWS Lambda and Glue converts to Parquet format and stores it in a cleansed S3 bucket. AWS QuickSight then visualizes the materialised data.
Language: Python - Size: 2.89 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

julian506/openweathermap-etl
A simple ETL for temperature data from the Openweathermap API, storing it into an Azure SQL Database
Language: Python - Size: 18.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

AyushRaiKhare/Ayush_Khare_Data_Engineering_Portfolio
Ayush @ Data Engineering Portfolio
Size: 18.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Cognizant-Technology-Innovation/lakehouseops-sra-for-databricks Fork of databricks/terraform-databricks-sra
The Security Reference Architecture (SRA) implements typical security features as Terraform Templates that are deployed by most high-security organizations, and enforces controls for the largest risks that customers ask about most often.
Language: HCL - Size: 1.58 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

brunocampos01/predicting-retail-churn-with-azure-ml-studio
Challenge to job: Data Scientist
Language: Python - Size: 25.4 MB - Last synced at: 4 months ago - Pushed at: over 3 years ago - Stars: 14 - Forks: 1

alfredzou/BoardGameGeek_Pipeline
Pipeline to automate the collection of board game and expansion data from BoardGameGeek's XML API2. Data is stored in Google Cloud Storage and BigQuery. Data is modelled using DBT in a star schema.
Language: Python - Size: 1.16 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

enochiankim/NYC-uber-data-engineering-project-etl-pipeline
NYC Uber Data Engineering ETL Project encompasses a comprehensive data engineering endeavor, encompassing the development of data pipelines leading to the creation of a dashboard.
Language: Jupyter Notebook - Size: 5.07 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

jolly-io/Data_Engineering_Notes
Size: 103 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

salimt/Spotify-API-Pipeline
Spotify API, Airflow, Docker, AWS S3, Snowflake, dbt, localstack, Looker Studio
Language: Python - Size: 181 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 0

snandasena/disaster-response-pipeline
Disaster Response Pipeline | Data Engineering
Language: Jupyter Notebook - Size: 5.1 MB - Last synced at: 18 days ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 0

dogucanelci/Azure_e2e_data_engineering_project_1
Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

dogucanelci/dogucanelci-GCP_Retail_Airflow_Data_Engineering_Project
Language: Python - Size: 8.35 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

dogucanelci/GCP_Uber_Data_Engineering_Project
Language: Python - Size: 4.12 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

prayagnshah/End-to-End-Pipeline
Zillow Data Pipeline: Extracts data from Zillow, transfers it through AWS services, and performs analytics. Utilizes Python scripts, AWS Lambda, S3, Amazon RedShift, and QuickSight. Explore docs/images for architecture visuals.
Language: Python - Size: 727 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

shiv-rna/Youtube-Data-Engineering-Pipeline
This project repo 📺 offers a robust solution meticulously crafted to efficiently manage, process, and analyze YouTube video data leveraging the power of AWS services. Whether you're diving into structured statistics or exploring the nuances of trending key metrics, this pipeline is engineered to handle it all with finesse.
Language: Python - Size: 179 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

markditsworth/TweetAnalyzer
An environment for analyzing Twitter
Language: Python - Size: 1.88 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 2

sumitdeole/Data_engineering_project
This project demonstrates a local and cloud execution of automated data collection and cleaning pipelines.
Language: Jupyter Notebook - Size: 335 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

data2al/dbt-tutorial-course Fork of jack-cook-repo/dbt-tutorial-course
Size: 39.1 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

AlphanAksoyoglu/tweeter-etl-pipeline
A streaming ETL pipeline for Realtime Tweet Collection, Analysis and Reporting
Language: Python - Size: 278 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 7 - Forks: 3

dmdequin/de_zoomcamp
Data Engineering Zoomcamp course assignments and notes.
Language: HCL - Size: 33.7 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

igorlangoni/final_project_data_eng_makers Fork of jdench1989/data_eng_final_project
Final project for the Makers Academy Data Engineering Bootcamp! In this amazing, complex group project we had to analyse a massive dataset and extract insightful data that could be used to improve education world-wide!
Language: Python - Size: 125 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

NitinDatta8/realtime-data-streaming
End-to-end data engineering pipeline with various technologies to ingest real time data.
Language: Python - Size: 284 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 1

Susanhuynh/aws_etl_from_s3_to_redshift
Building an ETL pipeline for a database hosted on Redshift. Extracting data from S3 to staging tables on Redshift . Transforming data by executing SQL statements that create the analytics tables from these staging tables by start schema. Loading star schema tables to Redshift
Language: Jupyter Notebook - Size: 472 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ellwag/CustomerDataETLPipeline
ETL Pipeline for Shopping Data
Language: Python - Size: 47.7 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

AyersAuthentic/Data_Engineering
Projects and Exercises for Udacity Data Engineering Nano Degree
Language: HTML - Size: 1.63 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

Ruth-Mwangi/youtube-data-etl
The purpose of the project is to efficiently collect, process, and store Twitter data using a combination of Apache Airflow, Apache Spark, and Amazon S3.
Language: Python - Size: 15.6 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

prabhath-r/Forecasting-household-energy-consumption
Application to forecast electricity consumption based on 3 years of previous data
Language: Python - Size: 116 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0
