Topic: "data-engineering-pipeline"
san089/Udacity-Data-Engineering-Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Language: Python - Size: 2.03 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 1,596 - Forks: 510

san089/goodreads_etl_pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Language: Python - Size: 1.31 MB - Last synced at: 25 days ago - Pushed at: about 5 years ago - Stars: 1,365 - Forks: 224

vmware/versatile-data-kit
One framework to develop, deploy and operate data workflows with Python and SQL.
Language: Python - Size: 110 MB - Last synced at: 7 days ago - Pushed at: about 1 month ago - Stars: 449 - Forks: 59

alanchn31/Movalytics-Data-Warehouse
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
Language: Python - Size: 717 KB - Last synced at: 5 months ago - Pushed at: almost 5 years ago - Stars: 133 - Forks: 31

anna-geller/dataflow-ops
Project demonstrating how to automate Prefect 2.0 deployments to AWS ECS Fargate
Language: Python - Size: 1.32 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 113 - Forks: 24

anna-geller/prefect-deployment-patterns
Code examples showing flow deployment to various types of infrastructure
Language: Python - Size: 249 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 105 - Forks: 10

AhmetFurkanDEMIR/Data-Engineering-Project-with-HDFS-and-Kafka
Data Engineering Project with Hadoop HDFS and Kafka
Language: Python - Size: 3.46 MB - Last synced at: 29 days ago - Pushed at: over 1 year ago - Stars: 102 - Forks: 25

immu0001/Udacity-Data-Engineer-nanodegree
Classwork projects and home works done through Udacity data engineering nano degree
Language: Jupyter Notebook - Size: 101 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 64 - Forks: 71

anki-code/xontrib-pipeliner
Let your pipe lines flow thru the Python code in xonsh.
Language: Python - Size: 149 KB - Last synced at: 10 days ago - Pushed at: 11 months ago - Stars: 59 - Forks: 4

anna-geller/prefect-aws-lambda
Deploy a Prefect flow to serverless AWS Lambda function
Language: Python - Size: 19.5 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 36 - Forks: 6

mikeroyal/Apache-Spark-Guide
Apache Spark Guide
Language: Python - Size: 237 KB - Last synced at: 8 days ago - Pushed at: over 3 years ago - Stars: 31 - Forks: 11

kishlayjeet/Stock-Market-Real-Time-Data-Pipeline-with-Apache-Kafka-and-Cassandra
A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apache Kafka and stored in a local Cassandra database.
Language: Python - Size: 2.3 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 25 - Forks: 7

InosRahul/f1-data-pipeline
F1 Data Pipeline
Language: Python - Size: 401 KB - Last synced at: 10 days ago - Pushed at: almost 2 years ago - Stars: 23 - Forks: 4

longNguyen010203/Youtube-Recommend-Master-ETL-Pipeline
A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker. Data from kaggle and youtube-api
Language: Jupyter Notebook - Size: 701 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 21 - Forks: 2

gear5sh/Gear5
high performance better alternative to Airbyte, Singer, Meltano
Language: Go - Size: 30.6 MB - Last synced at: 10 months ago - Pushed at: 11 months ago - Stars: 16 - Forks: 4

sanjeevai/disaster-response-pipeline
ETL pipeline combined with supervised learning and grid search to classify text messages sent during a disaster event
Language: Python - Size: 73.9 MB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 16 - Forks: 12

VeraZab/nyc-stats
Analysis of 311 Service Requests for the City of NYC (from 2010 to 2023) Tech: Prefect cloud, dbt core, BigQuery, Compute Engine, CloudRun, Artifact Registry, Terraform, Docker
Language: Python - Size: 4.88 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 14 - Forks: 3

brunocampos01/predicting-retail-churn-with-azure-ml-studio
Challenge to job: Data Scientist
Language: Python - Size: 25.4 MB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 14 - Forks: 1

DarkStarStrix/DataVolt
Reusable data engineering toolkit My personal data infrastructure
Language: Jupyter Notebook - Size: 13.4 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 13 - Forks: 2

kishlayjeet/Twitter-Data-Pipeline-using-Airflow-and-AWS-S3
An end-to-end Twitter Data Pipeline that extracts data from Twitter and loads it into AWS S3.
Language: Python - Size: 20.5 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 13 - Forks: 6

dylanzenner/business_closures_de_pipeline
Data Engineering pipeline hosted entirely in the AWS ecosystem utilizing DocumentDB as the database
Language: Python - Size: 3.95 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 13 - Forks: 6

ketgo/marshmallow-pyspark
Marshmallow serializer integration with pyspark
Language: Python - Size: 63.5 KB - Last synced at: 19 days ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 4

san089/data-engineer-roadmap Fork of boringPpl/data-engineer-roadmap
Learning from multiple companies in Silicon Valley. Netflix, Facebook, Google, Startups
Size: 213 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 12 - Forks: 7

Alero-Awani/Batch-data-engineering-project
A batch Data Pipeline that retrieves data from a user purchase table and a movie review table and is transformed to form a user behaviour metric table.
Language: HCL - Size: 727 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 11 - Forks: 0

koksang/social-media-analysis
Social Media Analysis, scalable solution, flexible deployment that analyses social media contents
Language: Jupyter Notebook - Size: 5.21 MB - Last synced at: 10 months ago - Pushed at: almost 2 years ago - Stars: 10 - Forks: 1

benedekrozemberczki/AV_Ultimate_Student_Hunt
Solution for the Ultimate Student Hunt Challenge (1st place).
Language: R - Size: 43 KB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 9 - Forks: 9

datarootsio/notion-dbs-data-quality
Using Great Expectations and Notion's API, this repo aims to provide data quality for our databases in Notion.
Language: Python - Size: 56.3 MB - Last synced at: 7 days ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 0

anna-geller/dataflow-ops-aws-eks
Project demonstrating how to automate Prefect 2.0 deployments to AWS EKS
Language: Python - Size: 78.1 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 1

siddharth271101/Covid-19-and-Aviation-Industry
The goal of this project is to analyse the impact of Covid-19 on the Aviation industry through data engineering processes using technologies such as Apache Airflow, Apache Spark, Tableau and couple of AWS services
Language: Python - Size: 15.4 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 8 - Forks: 3

kkrusere/NHANES-pyTOOL-API
The NHANES Data 'API' is a Python tool that simplifies access to the National Health and Nutrition Examination Survey (NHANES) dataset. This project provides an easy-to-use API to retrieve NHANES data, helping researchers, data scientists, health professionals, and other stakeholders access these valuable datasets.
Language: Python - Size: 215 KB - Last synced at: 25 days ago - Pushed at: 3 months ago - Stars: 7 - Forks: 5

anna-geller/prefect-getting-started
Get started with Prefect by scheduling your Prefect flows with GitHub Actions
Language: Python - Size: 22.5 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 1

AlphanAksoyoglu/tweeter-etl-pipeline
A streaming ETL pipeline for Realtime Tweet Collection, Analysis and Reporting
Language: Python - Size: 278 KB - Last synced at: 12 months ago - Pushed at: over 3 years ago - Stars: 7 - Forks: 3

BayoAdejare/lightning-containers
Docker powered starter for geospatial analysis of lightning atmospheric data.
Language: Python - Size: 159 MB - Last synced at: 3 days ago - Pushed at: 7 days ago - Stars: 6 - Forks: 2

markditsworth/TweetAnalyzer
An environment for analyzing Twitter
Language: Python - Size: 1.88 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 2

HelloSongi/Spotify-Data-Pipeline
Routinely collects trending songs world and stores them to a storage pool. Utilizes various Microsoft Azure services(ADLS, ADF, Synapse Analysis, Azure Functions, Logic Apps), Spotify API
Language: Jupyter Notebook - Size: 374 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 1

salimt/Spotify-API-Pipeline
Spotify API, Airflow, Docker, AWS S3, Snowflake, dbt, localstack, Looker Studio
Language: Python - Size: 181 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 0

dylanzenner/greenhouse_gas_emissions_de_pipeline
Data Engineering pipeline hosted entirely in the AWS ecosystem utilizing DynamoDB as the database
Language: Python - Size: 2.43 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 0

Hamzah023/LabattDataPipeline
This is a data pipeline that represents alcohol consumption per country made to analyze sales predictions for Labatt
Language: Python - Size: 33.7 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 0

kaoutaar/velib_v1
end to end data engineering project
Language: Python - Size: 510 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 0

chrisammon3000/aws-permits-pipeline
ETL pipeline for construction permits data in Los Angeles built on AWS S3, Lambda and RDS PostgreSQL.
Language: Python - Size: 149 KB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 0

pyprogrammerblog/tiny-blocks
Tiny Blocks to build large and complex data pipelines!
Language: Python - Size: 70.8 MB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

zarexalvindaria/data-engineering
This repo contains the Data Engineering exercises I took in Datacamp.
Language: Python - Size: 76.7 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

ShihWen/tpe-mrt-traffic-etl
A data pipeline from source to data warehouse using Taipei Metro Hourly Traffic data
Language: Jupyter Notebook - Size: 17.5 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

DeleLinus/HFR-Data-Warehousing
End-to-end data engineering processes for the NIGERIA Health Facility Registry (HFR). The project leveraged Selenium, Pandas, PySpark, PostgreSQL and Airflow
Language: Python - Size: 1.05 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

fonsecagabriella/data_engineering
Building end-to-end data pipelines
Language: Jupyter Notebook - Size: 331 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 2

hq969/Youtube-Data-Pipeline-AWS
About Leveraging AWS Cloud Services, an ETL pipeline transforms YouTube video statistics data. Data is downloaded from Kaggle, uploaded to an S3 bucket, and cataloged using AWS Glue for querying with Athena. AWS Lambda and Glue converts to Parquet format and stores it in a cleansed S3 bucket. AWS QuickSight then visualizes the materialised data.
Language: Python - Size: 1.69 MB - Last synced at: 4 days ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

ShubhamMohanty680/Spotify_Snowflake
It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS into snowflake datawarehouse. It utilizes AWS services such as Lambda, S3, and CloudWatch to orchestrate the process. The transformed data is then loaded into Snowflake using Snowpipe, and finally visualized in Power BI.
Language: Python - Size: 1.79 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 2 - Forks: 1

ShubhamMohanty680/Spotify_end_to_end_data_engineering
It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS.
Language: Jupyter Notebook - Size: 1.44 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

julian506/openweathermap-etl
A simple ETL for temperature data from the Openweathermap API, storing it into an Azure SQL Database
Language: Python - Size: 18.6 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 2 - Forks: 0

dogucanelci/Azure_e2e_data_engineering_project_1
Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

NitinDatta8/realtime-data-streaming
End-to-end data engineering pipeline with various technologies to ingest real time data.
Language: Python - Size: 284 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 1

bramvdklinkenberg/adf-airflow-data-project
Data engineering project using Azure Data Factory and Apache Airflow
Language: Python - Size: 45.9 KB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

jackmulligan-ire/ppr-pipeline
Irish Property Price Register transformed into a data warehouse via an EtLT pipeline.
Language: TypeScript - Size: 22.7 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

MeganAlbee/TheWedge
This repository will showcase extracting, cleaning and uploading files onto Google Bigquery, based on a project for the MSBA program at the University of Montana.
Language: Jupyter Notebook - Size: 543 KB - Last synced at: 9 months ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 1

jmdatasci/user-behavior-spark-pipeline
Streaming Data Pipeline ETL with PySpark, Hadoop, Docker-Compose, Kafka and Redis
Language: Python - Size: 25.4 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

dvu4/udacity-data-engineering
Data Engineering Projects including Data Modeling, Data Warehouse, Data Lake Development
Language: Jupyter Notebook - Size: 2.09 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 2

snandasena/disaster-response-pipeline
Disaster Response Pipeline | Data Engineering
Language: Jupyter Notebook - Size: 5.1 MB - Last synced at: about 2 months ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 0

jimitmistry/Stock-Market-Prediction-with-LSTM-and-Data-Pipeline
Language: Jupyter Notebook - Size: 229 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 0

marianajo/beam-examples
Examples that I use to learn and show Apache Beam
Language: Python - Size: 14.6 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 0

sanketrs/implementation-of-modern-data-engineering-architecture-with-fabric_analytics
Building a next-generation hybrid data pipeline architecture that combines the power of Microsoft Fabric, Azure Cloud, and Power BI. This pipeline is engineered to tackle the challenges of real-time data ingestion, multi-layered processing, and analytics, delivering business-critical insights.
Language: Python - Size: 32.2 KB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

JBris/dagster-dbt-openmetadata-docker
Docker deployment of Dagster, DBT, and OpenMetadata
Language: Python - Size: 2.03 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

JessicaHora/JessicaHora
Size: 9.17 MB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

PATRICIAJUNQUEIRA/Airflow_Pipeline_Gera_Pasta
Pipeline de dados automatizado para extraΓ§Γ£o e armazenamento de previsΓ΅es meteorolΓ³gicas para o setor de turismo.
Language: Python - Size: 104 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

alfredzou/BoardGameGeek_Pipeline
Pipeline to automate the collection of board game and expansion data from BoardGameGeek's XML API2. Data is stored in Google Cloud Storage and BigQuery. Data is modelled using DBT in a star schema.
Language: Python - Size: 1.16 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

dogucanelci/dogucanelci-GCP_Retail_Airflow_Data_Engineering_Project
Language: Python - Size: 8.35 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

dogucanelci/GCP_Uber_Data_Engineering_Project
Language: Python - Size: 4.12 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

sumitdeole/Data_engineering_project
This project demonstrates a local and cloud execution of automated data collection and cleaning pipelines.
Language: Jupyter Notebook - Size: 335 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

dmdequin/de_zoomcamp
Data Engineering Zoomcamp course assignments and notes.
Language: HCL - Size: 33.7 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

rivas-j/Big_Data_Marketing_Analysis-AWS-Spark-SQL
Build Data Pipeline with pgAdmin, AWS Cloud and Apache Spark to Analyze and Determine Bias in Amazon Vine Reviews
Language: Jupyter Notebook - Size: 305 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 1

tmaferreira/DataEngineeringZoomCampProject
Data Engineering ZoomCamp Course Project
Language: Python - Size: 143 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 1

sayantikabanik/presentations_conferences
Presentations/tutorials delivered by me at various conferences π©π½βπ»
Size: 10.9 MB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

hxwwong/DEBC1-Sprint3-NER-OSM-Airflow-Pipeline
A New Entity Recognition and OSM Data Pipeline hosted locally on Apache Airflow and a Docker container. Made with Phoemela Ballaran as the final output of Sprint 3 of the Data Engineering Bootcamp
Language: Python - Size: 29 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 2

ZahidGalea/data-engineering-in-gcp-challenge Fork of walmartdigital/de-challenge
Language: Python - Size: 837 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

Mark-McAdam/Data-Engineering-Batch
Takes product reviews and performs natural language processing to provide sentiment analysis. The new insight gets combined with matching product information in the central database to provide a clearer picture of user behavior.
Language: Python - Size: 963 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

JiajunSong629/Quick_OCR_with_AWS_Lambda
A quick implementation of OCR Application with AWS Lambda.
Language: Python - Size: 6.84 KB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

Mcamin/Disaster-Response-Pipeline
ETL Pipeline / ML Pipeline of Disaster Data provided by figure8
Language: Jupyter Notebook - Size: 14.7 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 6

indranilekkala/movie_data_pipeline
An end-to-end data pipeline that processes movie data from TMDb API, stores it in PostgreSQL, and visualizes trends using Metabase.
Language: Python - Size: 17.6 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

Omio-saha/Spotify_Data_Pipe_Snowflake
It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS into snowflake datawarehouse. It utilizes AWS services such as Lambda, S3, and CloudWatch to orchestrate the process. The transformed data is then loaded into Snowflake using Snowpipe, and finally visualized in Power BI.
Size: 1000 Bytes - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

carlosgdoss/Data-Professional-Survey
This Power BI dashboard analyzes survey responses from data professionals, covering key aspects such as salary distribution, job satisfaction, and preferred programming languages. The insights help understand trends in the data industry and what matters most to professionals.
Size: 303 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

alwi2404/ETL-Pipeline-for-Region-Segmentation-and-Product-Performance-Analysis
An ETL Project using SQL Server Integration Services (SSIS) for Region Segmentation and Sales Performance Analysis with real-world data pipelines and business insights.
Size: 24.4 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

KhotChaitanya/Customer_Segmentation_ETL_SSIS
An ETL Project using SQL Server Integration Services (SSIS) for Customer Segmentation and Sales Performance Analysis with real-world data pipelines and business insights.
Size: 29.3 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

abhinav-pandey29/eats-data-miner
An end-to-end data pipeline project for DoorDash expense tracking
Language: Python - Size: 198 KB - Last synced at: 3 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

adark-d/smart-rental-pricing
Repository for the smart rental and recommendation system for listings in Ghana project
Language: HTML - Size: 2.22 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

zaw-may/Fabric-UserDataFunctions-ETL
Using Fabric User Data Functions Within A Data Pipeline
Language: Python - Size: 12.7 KB - Last synced at: 27 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

zaw-may/Fabric-OnPremises-ETL
Moving On-premises Data into Microsoft Fabric Data Stores
Language: TSQL - Size: 24.4 KB - Last synced at: 27 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

Danitilahun/AWS-Data-Engineering-project
In this AWS Data Engineering project, we delve into the intricacies of building a robust real-time data pipeline using DynamoDB, Snowflake, and AWS Lambda.
Language: Python - Size: 3.91 KB - Last synced at: about 5 hours ago - Pushed at: 29 days ago - Stars: 0 - Forks: 0

zaw-may/Fabric-OneLake-ETL
Crafting data solution with Fabric Analytics Pipeline
Language: TSQL - Size: 339 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 0

protonic/protonic.github.io
This is going to be my homepage and my profile page
Language: HTML - Size: 9.77 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

higorcazuza81/higorcazuza81
A little about me
Size: 3.56 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

eveliinahampus/openweather-datapipe
Data Engineering Project: ETL pipeline to fetch data from OpenWeather API with batch processing, tidy and transform data, load it to PostgreSQL database -- scheduled with Airflow.
Language: Jupyter Notebook - Size: 1.95 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

mananb77/data101-postgres-spark
Compare the efficiencies of Postgres and Apache Spark.
Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

aiden-liu/aiden-liu.github.io
Blog space on data engineering, machine learning, platform engineering.
Size: 90.8 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

tahir007malik/adventureWorksDataAnalytics
This repository showcases an end-to-end ETL pipeline leveraging Azure services, including ADF, ADLS Gen2, Databricks, and Synapse Analytics, to enhance data processing efficiency.
Language: Jupyter Notebook - Size: 3.5 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 1

fonsecagabriella/carbonlens
A Climate and Social Indicators Data Pipeline π π±π | data engineering
Language: Python - Size: 35.2 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

zack0061/End-to-End-Data-Pipeline
π A scalable, production-ready data pipeline for real-time streaming & batch processing, integrating Kafka, Spark, Airflow, AWS, Kubernetes, and MLflow. Supports end-to-end data ingestion, transformation, storage, monitoring, and AI/ML serving with CI/CD automation using Terraform & GitHub Actions.
Language: Python - Size: 2.6 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

yash-chauhan-dev/SPARK_CLUSTER_DOCKER
Set-up local spark cluster, hadoop (hdfs), airflow, postgresql on docker with ease, without any local installations
Language: Dockerfile - Size: 1.3 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Wb-az/MLib-PySpark-SoundLevel-Prediction
Creates a ML Pipeline leveraging PySpark SQL and PySpark MLib to predict sound level
Language: Jupyter Notebook - Size: 972 KB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

tahir007malik/fintechDataMigration
This repository showcases scalable data pipeline designed for migrating and transforming a fintech companyβs data from traditional SQL databases to Azure Data Lake.
Language: Jupyter Notebook - Size: 271 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

elmezianech/AutoInventory
This project is an end-to-end, fully automated warehouse management solution designed to tackle real-world inventory challenges in the FMCG sector. From real-time data ingestion and predictive analytics to interactive dashboards, this project combines cutting-edge technologies and an event-driven architecture to simulate a business-ready system.
Language: Python - Size: 61.5 KB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

iTrauco/pybro
yo, it's ya boy, pybro! π | a personal collection of python hacks for 24.04 debian
Language: Python - Size: 35.2 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0
