An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: data-engineering-pipeline

fazeelibtesam/Scraper

Python script for web scrapping

Language: Jupyter Notebook - Size: 5.86 KB - Last synced at: 41 minutes ago - Pushed at: 44 minutes ago - Stars: 0 - Forks: 0

nishthapant/airflow

This project orchestrates an end-to-end data pipeline for an e-commerce dataset using Apache Airflow (in Docker) and a separate dbt (data build tool) project. The pipeline transforms raw source data into structured, analytics-ready datasets.

Language: Python - Size: 1.13 MB - Last synced at: about 5 hours ago - Pushed at: about 6 hours ago - Stars: 0 - Forks: 0

bitoollearner/de-project-BI-Learner

This Repository is dedicated for data engineering projects

Size: 12.7 KB - Last synced at: about 24 hours ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

higorcazuza81/higorcazuza81

A little about me

Size: 3.56 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

h-sutiwas/data-engineering-zoomcamp

This repository contains materials and in-class projects from all lesson in Data Engineering Zoomcamp 2025 by DataTalks.Club

Language: Jupyter Notebook - Size: 51.2 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

billy-moore-98/predictit

Batch ingestion pipeline for Predictit market data

Language: Python - Size: 138 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

carlosgdoss/Data-Professional-Survey

This Power BI dashboard analyzes survey responses from data professionals, covering key aspects such as salary distribution, job satisfaction, and preferred programming languages. The insights help understand trends in the data industry and what matters most to professionals.

Size: 303 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

Omio-saha/Spotify_Data_Pipe_Snowflake

It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS into snowflake datawarehouse. It utilizes AWS services such as Lambda, S3, and CloudWatch to orchestrate the process. The transformed data is then loaded into Snowflake using Snowpipe, and finally visualized in Power BI.

Size: 1000 Bytes - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

zaw-may/Fabric-Medallion-Architecture

Language: Jupyter Notebook - Size: 7.81 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

abhinav-pandey29/eats-data-miner

An end-to-end data pipeline project for DoorDash expense tracking

Language: Python - Size: 198 KB - Last synced at: 4 days ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

KarthikMahalingam8881/Amazon-Fake-Review-Detection-Pipeline

Amazon Fake Review Detection Pipeline

Language: Python - Size: 55.7 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

anna-geller/prefect-deployment-patterns

Code examples showing flow deployment to various types of infrastructure

Language: Python - Size: 249 KB - Last synced at: 1 day ago - Pushed at: over 2 years ago - Stars: 107 - Forks: 10

panthers-labs-pvt-ltd/progressive.mind.framework

Progressive Mind Framework is a metadata-driven data engineering framework that simplifies and automates the design and execution of data pipelines. It helps you build scalable, governed, and observable pipelines using metadata configurations instead of hand-written transformation logic.

Language: Scala - Size: 202 MB - Last synced at: 20 days ago - Pushed at: 21 days ago - Stars: 1 - Forks: 0

DarkStarStrix/DataVolt

Reusable data engineering toolkit My personal data infrastructure

Language: Jupyter Notebook - Size: 13.9 MB - Last synced at: 20 days ago - Pushed at: 21 days ago - Stars: 17 - Forks: 2

vmware/versatile-data-kit

One framework to develop, deploy and operate data workflows with Python and SQL.

Language: Python - Size: 110 MB - Last synced at: about 13 hours ago - Pushed at: 5 days ago - Stars: 450 - Forks: 59

san089/Udacity-Data-Engineering-Projects

Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.

Language: Python - Size: 2.03 MB - Last synced at: 25 days ago - Pushed at: almost 3 years ago - Stars: 1,620 - Forks: 527

alwi2404/ETL-Pipeline-for-Region-Segmentation-and-Product-Performance-Analysis

An ETL Project using SQL Server Integration Services (SSIS) for Region Segmentation and Sales Performance Analysis with real-world data pipelines and business insights.

Size: 24.4 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

kkrusere/NHANES-pyTOOL-API

The NHANES Data 'API' is a Python tool that simplifies access to the National Health and Nutrition Examination Survey (NHANES) dataset. This project provides an easy-to-use API to retrieve NHANES data, helping researchers, data scientists, health professionals, and other stakeholders access these valuable datasets.

Language: Python - Size: 215 KB - Last synced at: 1 day ago - Pushed at: 4 months ago - Stars: 9 - Forks: 5

Wb-az/MLib-PySpark-SoundLevel-Prediction

Creates a ML Pipeline leveraging PySpark SQL and PySpark MLib to predict sound level

Language: Jupyter Notebook - Size: 972 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

san089/goodreads_etl_pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Language: Python - Size: 1.31 MB - Last synced at: about 1 month ago - Pushed at: over 5 years ago - Stars: 1,378 - Forks: 227

husskhosravi/cricket-analytics-snowflake-pipeline

End-to-end Snowflake data pipeline for cricket analytics using JSON data from AWS S3, automated ingestion, transformation, and modelling into a scalable star schema

Size: 122 KB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

JBris/dagster-dbt-openmetadata-docker

Docker deployment of Dagster, DBT, and OpenMetadata

Language: Python - Size: 2.03 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 2 - Forks: 0

BayoAdejare/lightning-containers

Docker powered starter for geospatial analysis of lightning atmospheric data.

Language: Python - Size: 159 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 6 - Forks: 2

Data-Projects-AGN/Weather-Data-ETL-using-Kafka

Simulated real-time weather data pipeline using Python, Apache Kafka (multi-node), and PostgreSQL. Weather metrics are published to country-wise Kafka topics and stored in a data warehouse for downstream ETL and analytics.

Language: Python - Size: 33.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

indranilekkala/movie_data_pipeline

An end-to-end data pipeline that processes movie data from TMDb API, stores it in PostgreSQL, and visualizes trends using Metabase.

Language: Python - Size: 17.6 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

KhotChaitanya/Customer_Segmentation_ETL_SSIS

An ETL Project using SQL Server Integration Services (SSIS) for Customer Segmentation and Sales Performance Analysis with real-world data pipelines and business insights.

Size: 29.3 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

adark-d/smart-rental-pricing

Repository for the smart rental and recommendation system for listings in Ghana project

Language: HTML - Size: 2.44 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

zaw-may/Fabric-UserDataFunctions-ETL

Using Fabric User Data Functions Within A Data Pipeline

Language: Python - Size: 12.7 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

zaw-may/Fabric-OnPremises-ETL

Moving On-premises Data into Microsoft Fabric Data Stores

Language: TSQL - Size: 24.4 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Danitilahun/AWS-Data-Engineering-project

In this AWS Data Engineering project, we delve into the intricacies of building a robust real-time data pipeline using DynamoDB, Snowflake, and AWS Lambda.

Language: Python - Size: 3.91 KB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

zaw-may/Fabric-OneLake-ETL

Crafting data solution with Fabric Analytics Pipeline

Language: TSQL - Size: 339 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

AhmetFurkanDEMIR/Data-Engineering-Project-with-HDFS-and-Kafka

Data Engineering Project with Hadoop HDFS and Kafka

Language: Python - Size: 3.46 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 102 - Forks: 25

longNguyen010203/Youtube-Recommend-Master-ETL-Pipeline

A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker. Data from kaggle and youtube-api

Language: Jupyter Notebook - Size: 701 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 21 - Forks: 2

protonic/protonic.github.io

This is going to be my homepage and my profile page

Language: HTML - Size: 9.77 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

eveliinahampus/openweather-datapipe

Data Engineering Project: ETL pipeline to fetch data from OpenWeather API with batch processing, tidy and transform data, load it to PostgreSQL database -- scheduled with Airflow.

Language: Jupyter Notebook - Size: 1.95 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

mananb77/data101-postgres-spark

Compare the efficiencies of Postgres and Apache Spark.

Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

kishlayjeet/Stock-Market-Real-Time-Data-Pipeline-with-Apache-Kafka-and-Cassandra

A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apache Kafka and stored in a local Cassandra database.

Language: Python - Size: 2.3 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 25 - Forks: 7

aiden-liu/aiden-liu.github.io

Blog space on data engineering, machine learning, platform engineering.

Size: 90.8 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

tahir007malik/adventureWorksDataAnalytics

This repository showcases an end-to-end ETL pipeline leveraging Azure services, including ADF, ADLS Gen2, Databricks, and Synapse Analytics, to enhance data processing efficiency.

Language: Jupyter Notebook - Size: 3.5 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 1

fonsecagabriella/data_engineering

Building end-to-end data pipelines

Language: Jupyter Notebook - Size: 331 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 2

fonsecagabriella/carbonlens

A Climate and Social Indicators Data Pipeline 🌎 🌱📊 | data engineering

Language: Python - Size: 35.2 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

hq969/Youtube-Data-Pipeline-AWS

About Leveraging AWS Cloud Services, an ETL pipeline transforms YouTube video statistics data. Data is downloaded from Kaggle, uploaded to an S3 bucket, and cataloged using AWS Glue for querying with Athena. AWS Lambda and Glue converts to Parquet format and stores it in a cleansed S3 bucket. AWS QuickSight then visualizes the materialised data.

Language: Python - Size: 1.69 MB - Last synced at: 6 days ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

zack0061/End-to-End-Data-Pipeline

📈 A scalable, production-ready data pipeline for real-time streaming & batch processing, integrating Kafka, Spark, Airflow, AWS, Kubernetes, and MLflow. Supports end-to-end data ingestion, transformation, storage, monitoring, and AI/ML serving with CI/CD automation using Terraform & GitHub Actions.

Language: Python - Size: 2.6 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

InosRahul/f1-data-pipeline

F1 Data Pipeline

Language: Python - Size: 401 KB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 23 - Forks: 4

mikeroyal/Apache-Spark-Guide

Apache Spark Guide

Language: Python - Size: 237 KB - Last synced at: 4 days ago - Pushed at: over 3 years ago - Stars: 31 - Forks: 11

yash-chauhan-dev/SPARK_CLUSTER_DOCKER

Set-up local spark cluster, hadoop (hdfs), airflow, postgresql on docker with ease, without any local installations

Language: Dockerfile - Size: 1.3 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

ShubhamMohanty680/Spotify_Snowflake

It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS into snowflake datawarehouse. It utilizes AWS services such as Lambda, S3, and CloudWatch to orchestrate the process. The transformed data is then loaded into Snowflake using Snowpipe, and finally visualized in Power BI.

Language: Python - Size: 1.79 MB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 1

tahir007malik/fintechDataMigration

This repository showcases scalable data pipeline designed for migrating and transforming a fintech company’s data from traditional SQL databases to Azure Data Lake.

Language: Jupyter Notebook - Size: 271 KB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Hamzah023/LabattDataPipeline

This is a data pipeline that represents alcohol consumption per country made to analyze sales predictions for Labatt

Language: Python - Size: 33.7 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 4 - Forks: 0

ShubhamMohanty680/Spotify_end_to_end_data_engineering

It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS.

Language: Jupyter Notebook - Size: 1.44 MB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

anki-code/xontrib-pipeliner

Let your pipe lines flow thru the Python code in xonsh.

Language: Python - Size: 149 KB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 59 - Forks: 4

sanketrs/implementation-of-modern-data-engineering-architecture-with-fabric_analytics

Building a next-generation hybrid data pipeline architecture that combines the power of Microsoft Fabric, Azure Cloud, and Power BI. This pipeline is engineered to tackle the challenges of real-time data ingestion, multi-layered processing, and analytics, delivering business-critical insights.

Language: Python - Size: 32.2 KB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

elmezianech/AutoInventory

This project is an end-to-end, fully automated warehouse management solution designed to tackle real-world inventory challenges in the FMCG sector. From real-time data ingestion and predictive analytics to interactive dashboards, this project combines cutting-edge technologies and an event-driven architecture to simulate a business-ready system.

Language: Python - Size: 61.5 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

benedekrozemberczki/AV_Ultimate_Student_Hunt

Solution for the Ultimate Student Hunt Challenge (1st place).

Language: R - Size: 43 KB - Last synced at: 3 days ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 9

kishlayjeet/Twitter-Data-Pipeline-using-Airflow-and-AWS-S3

An end-to-end Twitter Data Pipeline that extracts data from Twitter and loads it into AWS S3.

Language: Python - Size: 20.5 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 13 - Forks: 6

iTrauco/pybro

yo, it's ya boy, pybro! 😎 | a personal collection of python hacks for 24.04 debian

Language: Python - Size: 35.2 KB - Last synced at: 4 days ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

Semiu/data-engineering

Introduction to Data Engineering

Language: Jupyter Notebook - Size: 109 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

alanchn31/Movalytics-Data-Warehouse

Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow

Language: Python - Size: 717 KB - Last synced at: 7 months ago - Pushed at: about 5 years ago - Stars: 133 - Forks: 31

anna-geller/prefect-aws-lambda

Deploy a Prefect flow to serverless AWS Lambda function

Language: Python - Size: 19.5 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 36 - Forks: 6

anna-geller/dataflow-ops

Project demonstrating how to automate Prefect 2.0 deployments to AWS ECS Fargate

Language: Python - Size: 1.32 MB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 113 - Forks: 24

BayoAdejare/pipeline-ecommerce

E-commerce Data Pipeline

Language: Python - Size: 22.5 MB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

BayoAdejare/pipeline-edtech

Edtech ADF Pipeline Project

Size: 12.7 KB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

BayoAdejare/pipeline-sleep

Sleep Data Pipeline with Azure Data Factory

Size: 21.5 KB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

JessicaHora/JessicaHora

Size: 9.17 MB - Last synced at: 12 days ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

anastasiamkh/aws-dataflow-simulator

Python package that simplifies the creation of AWS infrastructure for simulating real-time data streaming and batch processing, ideal for integrating into machine learning projects.

Language: Python - Size: 2.68 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

MeganAlbee/TheWedge

This repository will showcase extracting, cleaning and uploading files onto Google Bigquery, based on a project for the MSBA program at the University of Montana.

Language: Jupyter Notebook - Size: 543 KB - Last synced at: 11 months ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 1

kplofts/datasolve_dw_sql

create a datawarehouse in sql without external tools - conceptualized

Language: Python - Size: 140 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

jacquesbilombe/Payment-Patterns-Brazil

"Payment Patterns Brazil" is a data pipeline project leveraging cloud technologies to analyze consumer trends in Brazilian payment methods. This project involves data collection, modeling, loading, and analysis using Google Console and other cloud platforms. Explore insights on payment behaviors and trends across Brazil.

Language: Jupyter Notebook - Size: 2.9 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

gear5sh/Gear5

high performance better alternative to Airbyte, Singer, Meltano

Language: Go - Size: 30.6 MB - Last synced at: 11 months ago - Pushed at: about 1 year ago - Stars: 16 - Forks: 4

povoaaires/data_project_model

Data pipeline model repository

Language: Python - Size: 1.95 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

DivineSamOfficial/Banking-Data-Warehouse-Pipeline

Banking Data Warehouse Pipeline

Language: Python - Size: 52.1 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

PATRICIAJUNQUEIRA/Airflow_Pipeline_Gera_Pasta

Pipeline de dados automatizado para extração e armazenamento de previsões meteorológicas para o setor de turismo.

Language: Python - Size: 104 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

umairkarel/Amazon-Sales-Data-Engineering

Data Engineering Pipeline practice with Amazon Sales Data

Language: Python - Size: 6.35 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

waqarg2001/Youtube-Data-Pipeline-AWS

Leveraging AWS Cloud Services, an ETL pipeline transforms YouTube video statistics data. Data is downloaded from Kaggle, uploaded to an S3 bucket, and cataloged using AWS Glue for querying with Athena. AWS Lambda and Glue converts to Parquet format and stores it in a cleansed S3 bucket. AWS QuickSight then visualizes the materialised data.

Language: Python - Size: 2.89 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

julian506/openweathermap-etl

A simple ETL for temperature data from the Openweathermap API, storing it into an Azure SQL Database

Language: Python - Size: 18.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

AyushRaiKhare/Ayush_Khare_Data_Engineering_Portfolio

Ayush @ Data Engineering Portfolio

Size: 18.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Cognizant-Technology-Innovation/lakehouseops-sra-for-databricks Fork of databricks/terraform-databricks-sra

The Security Reference Architecture (SRA) implements typical security features as Terraform Templates that are deployed by most high-security organizations, and enforces controls for the largest risks that customers ask about most often.

Language: HCL - Size: 1.58 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

brunocampos01/predicting-retail-churn-with-azure-ml-studio

Challenge to job: Data Scientist

Language: Python - Size: 25.4 MB - Last synced at: 4 months ago - Pushed at: over 3 years ago - Stars: 14 - Forks: 1

alfredzou/BoardGameGeek_Pipeline

Pipeline to automate the collection of board game and expansion data from BoardGameGeek's XML API2. Data is stored in Google Cloud Storage and BigQuery. Data is modelled using DBT in a star schema.

Language: Python - Size: 1.16 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

enochiankim/NYC-uber-data-engineering-project-etl-pipeline

NYC Uber Data Engineering ETL Project encompasses a comprehensive data engineering endeavor, encompassing the development of data pipelines leading to the creation of a dashboard.

Language: Jupyter Notebook - Size: 5.07 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

jolly-io/Data_Engineering_Notes

Size: 103 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

salimt/Spotify-API-Pipeline

Spotify API, Airflow, Docker, AWS S3, Snowflake, dbt, localstack, Looker Studio

Language: Python - Size: 181 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 0

snandasena/disaster-response-pipeline

Disaster Response Pipeline | Data Engineering

Language: Jupyter Notebook - Size: 5.1 MB - Last synced at: 18 days ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 0

dogucanelci/Azure_e2e_data_engineering_project_1

Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

dogucanelci/dogucanelci-GCP_Retail_Airflow_Data_Engineering_Project

Language: Python - Size: 8.35 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

dogucanelci/GCP_Uber_Data_Engineering_Project

Language: Python - Size: 4.12 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

prayagnshah/End-to-End-Pipeline

Zillow Data Pipeline: Extracts data from Zillow, transfers it through AWS services, and performs analytics. Utilizes Python scripts, AWS Lambda, S3, Amazon RedShift, and QuickSight. Explore docs/images for architecture visuals.

Language: Python - Size: 727 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

shiv-rna/Youtube-Data-Engineering-Pipeline

This project repo 📺 offers a robust solution meticulously crafted to efficiently manage, process, and analyze YouTube video data leveraging the power of AWS services. Whether you're diving into structured statistics or exploring the nuances of trending key metrics, this pipeline is engineered to handle it all with finesse.

Language: Python - Size: 179 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

markditsworth/TweetAnalyzer

An environment for analyzing Twitter

Language: Python - Size: 1.88 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 2

sumitdeole/Data_engineering_project

This project demonstrates a local and cloud execution of automated data collection and cleaning pipelines.

Language: Jupyter Notebook - Size: 335 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

data2al/dbt-tutorial-course Fork of jack-cook-repo/dbt-tutorial-course

Size: 39.1 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

AlphanAksoyoglu/tweeter-etl-pipeline

A streaming ETL pipeline for Realtime Tweet Collection, Analysis and Reporting

Language: Python - Size: 278 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 7 - Forks: 3

dmdequin/de_zoomcamp

Data Engineering Zoomcamp course assignments and notes.

Language: HCL - Size: 33.7 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

igorlangoni/final_project_data_eng_makers Fork of jdench1989/data_eng_final_project

Final project for the Makers Academy Data Engineering Bootcamp! In this amazing, complex group project we had to analyse a massive dataset and extract insightful data that could be used to improve education world-wide!

Language: Python - Size: 125 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

NitinDatta8/realtime-data-streaming

End-to-end data engineering pipeline with various technologies to ingest real time data.

Language: Python - Size: 284 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 1

Susanhuynh/aws_etl_from_s3_to_redshift

Building an ETL pipeline for a database hosted on Redshift. Extracting data from S3 to staging tables on Redshift . Transforming data by executing SQL statements that create the analytics tables from these staging tables by start schema. Loading star schema tables to Redshift

Language: Jupyter Notebook - Size: 472 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ellwag/CustomerDataETLPipeline

ETL Pipeline for Shopping Data

Language: Python - Size: 47.7 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

AyersAuthentic/Data_Engineering

Projects and Exercises for Udacity Data Engineering Nano Degree

Language: HTML - Size: 1.63 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

Ruth-Mwangi/youtube-data-etl

The purpose of the project is to efficiently collect, process, and store Twitter data using a combination of Apache Airflow, Apache Spark, and Amazon S3.

Language: Python - Size: 15.6 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

prabhath-r/Forecasting-household-energy-consumption

Application to forecast electricity consumption based on 3 years of previous data

Language: Python - Size: 116 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0