data-engineering-pipeline | Topic

san089/Udacity-Data-Engineering-Projects

Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.

Language: Python - Size: 2.03 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 1,596 - Forks: 510

san089/goodreads_etl_pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Language: Python - Size: 1.31 MB - Last synced at: 25 days ago - Pushed at: about 5 years ago - Stars: 1,365 - Forks: 224

vmware/versatile-data-kit

One framework to develop, deploy and operate data workflows with Python and SQL.

Language: Python - Size: 110 MB - Last synced at: 7 days ago - Pushed at: about 1 month ago - Stars: 449 - Forks: 59

alanchn31/Movalytics-Data-Warehouse

Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow

Language: Python - Size: 717 KB - Last synced at: 5 months ago - Pushed at: almost 5 years ago - Stars: 133 - Forks: 31

anna-geller/dataflow-ops

Project demonstrating how to automate Prefect 2.0 deployments to AWS ECS Fargate

Language: Python - Size: 1.32 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 113 - Forks: 24

anna-geller/prefect-deployment-patterns

Code examples showing flow deployment to various types of infrastructure

Language: Python - Size: 249 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 105 - Forks: 10

AhmetFurkanDEMIR/Data-Engineering-Project-with-HDFS-and-Kafka

Data Engineering Project with Hadoop HDFS and Kafka

Language: Python - Size: 3.46 MB - Last synced at: 29 days ago - Pushed at: over 1 year ago - Stars: 102 - Forks: 25

immu0001/Udacity-Data-Engineer-nanodegree

Classwork projects and home works done through Udacity data engineering nano degree

Language: Jupyter Notebook - Size: 101 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 64 - Forks: 71

anki-code/xontrib-pipeliner

Let your pipe lines flow thru the Python code in xonsh.

Language: Python - Size: 149 KB - Last synced at: 10 days ago - Pushed at: 11 months ago - Stars: 59 - Forks: 4

anna-geller/prefect-aws-lambda

Deploy a Prefect flow to serverless AWS Lambda function

Language: Python - Size: 19.5 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 36 - Forks: 6

mikeroyal/Apache-Spark-Guide

Apache Spark Guide

Language: Python - Size: 237 KB - Last synced at: 8 days ago - Pushed at: over 3 years ago - Stars: 31 - Forks: 11

kishlayjeet/Stock-Market-Real-Time-Data-Pipeline-with-Apache-Kafka-and-Cassandra

A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apache Kafka and stored in a local Cassandra database.

Language: Python - Size: 2.3 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 25 - Forks: 7

InosRahul/f1-data-pipeline

F1 Data Pipeline

Language: Python - Size: 401 KB - Last synced at: 10 days ago - Pushed at: almost 2 years ago - Stars: 23 - Forks: 4

longNguyen010203/Youtube-Recommend-Master-ETL-Pipeline

A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker. Data from kaggle and youtube-api

Language: Jupyter Notebook - Size: 701 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 21 - Forks: 2

gear5sh/Gear5

high performance better alternative to Airbyte, Singer, Meltano

Language: Go - Size: 30.6 MB - Last synced at: 10 months ago - Pushed at: 11 months ago - Stars: 16 - Forks: 4

sanjeevai/disaster-response-pipeline

ETL pipeline combined with supervised learning and grid search to classify text messages sent during a disaster event

Language: Python - Size: 73.9 MB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 16 - Forks: 12

VeraZab/nyc-stats

Analysis of 311 Service Requests for the City of NYC (from 2010 to 2023) Tech: Prefect cloud, dbt core, BigQuery, Compute Engine, CloudRun, Artifact Registry, Terraform, Docker

Language: Python - Size: 4.88 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 14 - Forks: 3

brunocampos01/predicting-retail-churn-with-azure-ml-studio

Challenge to job: Data Scientist

Language: Python - Size: 25.4 MB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 14 - Forks: 1

DarkStarStrix/DataVolt

Reusable data engineering toolkit My personal data infrastructure

Language: Jupyter Notebook - Size: 13.4 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 13 - Forks: 2

kishlayjeet/Twitter-Data-Pipeline-using-Airflow-and-AWS-S3

An end-to-end Twitter Data Pipeline that extracts data from Twitter and loads it into AWS S3.

Language: Python - Size: 20.5 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 13 - Forks: 6

dylanzenner/business_closures_de_pipeline

Data Engineering pipeline hosted entirely in the AWS ecosystem utilizing DocumentDB as the database

Language: Python - Size: 3.95 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 13 - Forks: 6

ketgo/marshmallow-pyspark

Marshmallow serializer integration with pyspark

Language: Python - Size: 63.5 KB - Last synced at: 19 days ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 4

san089/data-engineer-roadmap Fork of boringPpl/data-engineer-roadmap

Learning from multiple companies in Silicon Valley. Netflix, Facebook, Google, Startups

Size: 213 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 12 - Forks: 7

Alero-Awani/Batch-data-engineering-project

A batch Data Pipeline that retrieves data from a user purchase table and a movie review table and is transformed to form a user behaviour metric table.

Language: HCL - Size: 727 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 11 - Forks: 0

koksang/social-media-analysis

Social Media Analysis, scalable solution, flexible deployment that analyses social media contents

Language: Jupyter Notebook - Size: 5.21 MB - Last synced at: 10 months ago - Pushed at: almost 2 years ago - Stars: 10 - Forks: 1

benedekrozemberczki/AV_Ultimate_Student_Hunt

Solution for the Ultimate Student Hunt Challenge (1st place).

Language: R - Size: 43 KB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 9 - Forks: 9

datarootsio/notion-dbs-data-quality

Using Great Expectations and Notion's API, this repo aims to provide data quality for our databases in Notion.

Language: Python - Size: 56.3 MB - Last synced at: 7 days ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 0

anna-geller/dataflow-ops-aws-eks

Project demonstrating how to automate Prefect 2.0 deployments to AWS EKS

Language: Python - Size: 78.1 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 1

siddharth271101/Covid-19-and-Aviation-Industry

The goal of this project is to analyse the impact of Covid-19 on the Aviation industry through data engineering processes using technologies such as Apache Airflow, Apache Spark, Tableau and couple of AWS services

Language: Python - Size: 15.4 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 8 - Forks: 3

kkrusere/NHANES-pyTOOL-API

The NHANES Data 'API' is a Python tool that simplifies access to the National Health and Nutrition Examination Survey (NHANES) dataset. This project provides an easy-to-use API to retrieve NHANES data, helping researchers, data scientists, health professionals, and other stakeholders access these valuable datasets.

Language: Python - Size: 215 KB - Last synced at: 25 days ago - Pushed at: 3 months ago - Stars: 7 - Forks: 5

anna-geller/prefect-getting-started

Get started with Prefect by scheduling your Prefect flows with GitHub Actions

Language: Python - Size: 22.5 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 1

AlphanAksoyoglu/tweeter-etl-pipeline

A streaming ETL pipeline for Realtime Tweet Collection, Analysis and Reporting

Language: Python - Size: 278 KB - Last synced at: 12 months ago - Pushed at: over 3 years ago - Stars: 7 - Forks: 3

BayoAdejare/lightning-containers

Docker powered starter for geospatial analysis of lightning atmospheric data.

Language: Python - Size: 159 MB - Last synced at: 3 days ago - Pushed at: 7 days ago - Stars: 6 - Forks: 2

markditsworth/TweetAnalyzer

An environment for analyzing Twitter

Language: Python - Size: 1.88 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 2

HelloSongi/Spotify-Data-Pipeline

Routinely collects trending songs world and stores them to a storage pool. Utilizes various Microsoft Azure services(ADLS, ADF, Synapse Analysis, Azure Functions, Logic Apps), Spotify API

Language: Jupyter Notebook - Size: 374 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 1

salimt/Spotify-API-Pipeline

Spotify API, Airflow, Docker, AWS S3, Snowflake, dbt, localstack, Looker Studio

Language: Python - Size: 181 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 0

dylanzenner/greenhouse_gas_emissions_de_pipeline

Data Engineering pipeline hosted entirely in the AWS ecosystem utilizing DynamoDB as the database

Language: Python - Size: 2.43 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 0

Hamzah023/LabattDataPipeline

This is a data pipeline that represents alcohol consumption per country made to analyze sales predictions for Labatt

Language: Python - Size: 33.7 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 0

kaoutaar/velib_v1

end to end data engineering project

Language: Python - Size: 510 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 0

chrisammon3000/aws-permits-pipeline

ETL pipeline for construction permits data in Los Angeles built on AWS S3, Lambda and RDS PostgreSQL.

Language: Python - Size: 149 KB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 0

pyprogrammerblog/tiny-blocks

Tiny Blocks to build large and complex data pipelines!

Language: Python - Size: 70.8 MB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

zarexalvindaria/data-engineering

This repo contains the Data Engineering exercises I took in Datacamp.

Language: Python - Size: 76.7 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

ShihWen/tpe-mrt-traffic-etl

A data pipeline from source to data warehouse using Taipei Metro Hourly Traffic data

Language: Jupyter Notebook - Size: 17.5 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

DeleLinus/HFR-Data-Warehousing

End-to-end data engineering processes for the NIGERIA Health Facility Registry (HFR). The project leveraged Selenium, Pandas, PySpark, PostgreSQL and Airflow

Language: Python - Size: 1.05 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

fonsecagabriella/data_engineering

Building end-to-end data pipelines

Language: Jupyter Notebook - Size: 331 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 2

hq969/Youtube-Data-Pipeline-AWS

About Leveraging AWS Cloud Services, an ETL pipeline transforms YouTube video statistics data. Data is downloaded from Kaggle, uploaded to an S3 bucket, and cataloged using AWS Glue for querying with Athena. AWS Lambda and Glue converts to Parquet format and stores it in a cleansed S3 bucket. AWS QuickSight then visualizes the materialised data.

Language: Python - Size: 1.69 MB - Last synced at: 4 days ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

ShubhamMohanty680/Spotify_Snowflake

It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS into snowflake datawarehouse. It utilizes AWS services such as Lambda, S3, and CloudWatch to orchestrate the process. The transformed data is then loaded into Snowflake using Snowpipe, and finally visualized in Power BI.

Language: Python - Size: 1.79 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 2 - Forks: 1

ShubhamMohanty680/Spotify_end_to_end_data_engineering

It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS.

Language: Jupyter Notebook - Size: 1.44 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

julian506/openweathermap-etl

A simple ETL for temperature data from the Openweathermap API, storing it into an Azure SQL Database

Language: Python - Size: 18.6 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 2 - Forks: 0

dogucanelci/Azure_e2e_data_engineering_project_1

Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

NitinDatta8/realtime-data-streaming

End-to-end data engineering pipeline with various technologies to ingest real time data.

Language: Python - Size: 284 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 1

bramvdklinkenberg/adf-airflow-data-project

Data engineering project using Azure Data Factory and Apache Airflow

Language: Python - Size: 45.9 KB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

jackmulligan-ire/ppr-pipeline

Irish Property Price Register transformed into a data warehouse via an EtLT pipeline.

Language: TypeScript - Size: 22.7 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

MeganAlbee/TheWedge

This repository will showcase extracting, cleaning and uploading files onto Google Bigquery, based on a project for the MSBA program at the University of Montana.

Language: Jupyter Notebook - Size: 543 KB - Last synced at: 9 months ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 1

jmdatasci/user-behavior-spark-pipeline

Streaming Data Pipeline ETL with PySpark, Hadoop, Docker-Compose, Kafka and Redis

Language: Python - Size: 25.4 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

dvu4/udacity-data-engineering

Data Engineering Projects including Data Modeling, Data Warehouse, Data Lake Development

Language: Jupyter Notebook - Size: 2.09 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 2

snandasena/disaster-response-pipeline

Disaster Response Pipeline | Data Engineering

Language: Jupyter Notebook - Size: 5.1 MB - Last synced at: about 2 months ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 0

jimitmistry/Stock-Market-Prediction-with-LSTM-and-Data-Pipeline

Language: Jupyter Notebook - Size: 229 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 0

marianajo/beam-examples

Examples that I use to learn and show Apache Beam

Language: Python - Size: 14.6 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 0

sanketrs/implementation-of-modern-data-engineering-architecture-with-fabric_analytics

Building a next-generation hybrid data pipeline architecture that combines the power of Microsoft Fabric, Azure Cloud, and Power BI. This pipeline is engineered to tackle the challenges of real-time data ingestion, multi-layered processing, and analytics, delivering business-critical insights.

Language: Python - Size: 32.2 KB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

JBris/dagster-dbt-openmetadata-docker

Docker deployment of Dagster, DBT, and OpenMetadata

Language: Python - Size: 2.03 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

JessicaHora/JessicaHora

Size: 9.17 MB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

PATRICIAJUNQUEIRA/Airflow_Pipeline_Gera_Pasta

Pipeline de dados automatizado para extração e armazenamento de previsões meteorológicas para o setor de turismo.

Language: Python - Size: 104 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

alfredzou/BoardGameGeek_Pipeline

Pipeline to automate the collection of board game and expansion data from BoardGameGeek's XML API2. Data is stored in Google Cloud Storage and BigQuery. Data is modelled using DBT in a star schema.

Language: Python - Size: 1.16 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

dogucanelci/dogucanelci-GCP_Retail_Airflow_Data_Engineering_Project

Language: Python - Size: 8.35 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

dogucanelci/GCP_Uber_Data_Engineering_Project

Language: Python - Size: 4.12 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

sumitdeole/Data_engineering_project

This project demonstrates a local and cloud execution of automated data collection and cleaning pipelines.

Language: Jupyter Notebook - Size: 335 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

dmdequin/de_zoomcamp

Data Engineering Zoomcamp course assignments and notes.

Language: HCL - Size: 33.7 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

rivas-j/Big_Data_Marketing_Analysis-AWS-Spark-SQL

Build Data Pipeline with pgAdmin, AWS Cloud and Apache Spark to Analyze and Determine Bias in Amazon Vine Reviews

Language: Jupyter Notebook - Size: 305 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 1

tmaferreira/DataEngineeringZoomCampProject

Data Engineering ZoomCamp Course Project

Language: Python - Size: 143 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 1

sayantikabanik/presentations_conferences

Presentations/tutorials delivered by me at various conferences 👩🏽‍💻

Size: 10.9 MB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

hxwwong/DEBC1-Sprint3-NER-OSM-Airflow-Pipeline

A New Entity Recognition and OSM Data Pipeline hosted locally on Apache Airflow and a Docker container. Made with Phoemela Ballaran as the final output of Sprint 3 of the Data Engineering Bootcamp

Language: Python - Size: 29 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 2

ZahidGalea/data-engineering-in-gcp-challenge Fork of walmartdigital/de-challenge

Language: Python - Size: 837 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

Mark-McAdam/Data-Engineering-Batch

Takes product reviews and performs natural language processing to provide sentiment analysis. The new insight gets combined with matching product information in the central database to provide a clearer picture of user behavior.

Language: Python - Size: 963 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

JiajunSong629/Quick_OCR_with_AWS_Lambda

A quick implementation of OCR Application with AWS Lambda.

Language: Python - Size: 6.84 KB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

Mcamin/Disaster-Response-Pipeline

ETL Pipeline / ML Pipeline of Disaster Data provided by figure8

Language: Jupyter Notebook - Size: 14.7 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 6

indranilekkala/movie_data_pipeline

An end-to-end data pipeline that processes movie data from TMDb API, stores it in PostgreSQL, and visualizes trends using Metabase.

Language: Python - Size: 17.6 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

Omio-saha/Spotify_Data_Pipe_Snowflake

It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS into snowflake datawarehouse. It utilizes AWS services such as Lambda, S3, and CloudWatch to orchestrate the process. The transformed data is then loaded into Snowflake using Snowpipe, and finally visualized in Power BI.

Size: 1000 Bytes - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

carlosgdoss/Data-Professional-Survey

This Power BI dashboard analyzes survey responses from data professionals, covering key aspects such as salary distribution, job satisfaction, and preferred programming languages. The insights help understand trends in the data industry and what matters most to professionals.

Size: 303 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

alwi2404/ETL-Pipeline-for-Region-Segmentation-and-Product-Performance-Analysis

An ETL Project using SQL Server Integration Services (SSIS) for Region Segmentation and Sales Performance Analysis with real-world data pipelines and business insights.

Size: 24.4 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

KhotChaitanya/Customer_Segmentation_ETL_SSIS

An ETL Project using SQL Server Integration Services (SSIS) for Customer Segmentation and Sales Performance Analysis with real-world data pipelines and business insights.

Size: 29.3 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

abhinav-pandey29/eats-data-miner

An end-to-end data pipeline project for DoorDash expense tracking

Language: Python - Size: 198 KB - Last synced at: 3 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

adark-d/smart-rental-pricing

Repository for the smart rental and recommendation system for listings in Ghana project

Language: HTML - Size: 2.22 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

zaw-may/Fabric-UserDataFunctions-ETL

Using Fabric User Data Functions Within A Data Pipeline

Language: Python - Size: 12.7 KB - Last synced at: 27 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

zaw-may/Fabric-OnPremises-ETL

Moving On-premises Data into Microsoft Fabric Data Stores

Language: TSQL - Size: 24.4 KB - Last synced at: 27 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

Danitilahun/AWS-Data-Engineering-project

In this AWS Data Engineering project, we delve into the intricacies of building a robust real-time data pipeline using DynamoDB, Snowflake, and AWS Lambda.

Language: Python - Size: 3.91 KB - Last synced at: about 5 hours ago - Pushed at: 29 days ago - Stars: 0 - Forks: 0

zaw-may/Fabric-OneLake-ETL

Crafting data solution with Fabric Analytics Pipeline

Language: TSQL - Size: 339 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 0

protonic/protonic.github.io

This is going to be my homepage and my profile page

Language: HTML - Size: 9.77 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

higorcazuza81/higorcazuza81

A little about me

Size: 3.56 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

eveliinahampus/openweather-datapipe

Data Engineering Project: ETL pipeline to fetch data from OpenWeather API with batch processing, tidy and transform data, load it to PostgreSQL database -- scheduled with Airflow.

Language: Jupyter Notebook - Size: 1.95 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

mananb77/data101-postgres-spark

Compare the efficiencies of Postgres and Apache Spark.

Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

aiden-liu/aiden-liu.github.io

Blog space on data engineering, machine learning, platform engineering.

Size: 90.8 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

tahir007malik/adventureWorksDataAnalytics

This repository showcases an end-to-end ETL pipeline leveraging Azure services, including ADF, ADLS Gen2, Databricks, and Synapse Analytics, to enhance data processing efficiency.

Language: Jupyter Notebook - Size: 3.5 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 1

fonsecagabriella/carbonlens

A Climate and Social Indicators Data Pipeline 🌎 🌱📊 | data engineering

Language: Python - Size: 35.2 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

zack0061/End-to-End-Data-Pipeline

📈 A scalable, production-ready data pipeline for real-time streaming & batch processing, integrating Kafka, Spark, Airflow, AWS, Kubernetes, and MLflow. Supports end-to-end data ingestion, transformation, storage, monitoring, and AI/ML serving with CI/CD automation using Terraform & GitHub Actions.

Language: Python - Size: 2.6 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

yash-chauhan-dev/SPARK_CLUSTER_DOCKER

Set-up local spark cluster, hadoop (hdfs), airflow, postgresql on docker with ease, without any local installations

Language: Dockerfile - Size: 1.3 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Wb-az/MLib-PySpark-SoundLevel-Prediction

Creates a ML Pipeline leveraging PySpark SQL and PySpark MLib to predict sound level

Language: Jupyter Notebook - Size: 972 KB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

tahir007malik/fintechDataMigration

This repository showcases scalable data pipeline designed for migrating and transforming a fintech company’s data from traditional SQL databases to Azure Data Lake.

Language: Jupyter Notebook - Size: 271 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

elmezianech/AutoInventory

This project is an end-to-end, fully automated warehouse management solution designed to tackle real-world inventory challenges in the FMCG sector. From real-time data ingestion and predictive analytics to interactive dashboards, this project combines cutting-edge technologies and an event-driven architecture to simulate a business-ready system.

Language: Python - Size: 61.5 KB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

iTrauco/pybro

yo, it's ya boy, pybro! 😎 | a personal collection of python hacks for 24.04 debian

Language: Python - Size: 35.2 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Topic: "data-engineering-pipeline"

san089/data-engineer-roadmap Fork of boringPpl/data-engineer-roadmap

ZahidGalea/data-engineering-in-gcp-challenge Fork of walmartdigital/de-challenge