GitHub topics: etl-pipeline
Andy-Aranda/spotify-pipeline
Extract, analyze, and visualize your most-streamed songs on Spotify using Python, PostgreSQL, and Power BI. ETL project + interactive dashboard.
Language: Python - Size: 35.6 MB - Last synced at: about 8 hours ago - Pushed at: about 8 hours ago - Stars: 0 - Forks: 0

macielmk7/IMDB-Movie-Analysis
Analyze IMDB movie data with Python and pandas. Discover trends in ratings and genre popularity over time. 📊📈 Explore insights with ease.
Language: Jupyter Notebook - Size: 1.87 MB - Last synced at: about 11 hours ago - Pushed at: about 12 hours ago - Stars: 0 - Forks: 0

TriplyDB/Documentation
Documentation for the TriplyDB and TriplyETL products
Language: HTML - Size: 14.5 MB - Last synced at: about 19 hours ago - Pushed at: about 20 hours ago - Stars: 9 - Forks: 5

Zipstack/unstract
No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
Language: Python - Size: 32.9 MB - Last synced at: about 23 hours ago - Pushed at: about 24 hours ago - Stars: 5,388 - Forks: 506

JoseYahirHernandezCasanova/DIO_Santander_DataScience
Repository documenting my journey through the Santander Bootcamp 2023 - Data Science with Python. Contains projects, exercises, and materials covering Python, data visualization, machine learning models, and ETL pipelines developed during this comprehensive DIO educational program.
Size: 1000 Bytes - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

Hippaho/Sparkify
A music streaming company, Sparkify, has decided that it is time to introduce more automation and monitoring to their data warehouse ETL pipelines and come to the conclusion that the best tool to achieve this is Apache Airflow.
Language: Python - Size: 17.6 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

stellar/stellar-etl-airflow
Airflow DAGs for the Stellar ETL project
Language: Python - Size: 3.45 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 38 - Forks: 19

prosowiec/ETLsec
Containerized ETL pipeline orchestrated with Apache Airflow for extracting, transforming, and loading financial data from SEC EDGAR, earnings transcripts, and stock prices into a PostgreSQL warehouse.
Language: Python - Size: 7.15 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

MobileTeleSystems/onetl
One ETL tool to rule them all
Language: Python - Size: 8.79 MB - Last synced at: about 20 hours ago - Pushed at: about 21 hours ago - Stars: 80 - Forks: 6

yaoguangluo/ChromosomeDNA
《DNA元基催化与肽计算》 在进化计算中, 软件函数文件进行 DNA 语义元基索引编码的 PDE 新陈代谢优化方式, 是一种有效的进化方式.
Language: Java - Size: 678 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 7 - Forks: 2

s-yazhini/Hexa-DE-Main-Project
Data engineering main project 1
Language: Jupyter Notebook - Size: 15.5 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

chiragjaiswar/AWS-S3-to-Redshift-ETL-Pipeline
Automate your data flow with the AWS S3 to Redshift ETL Pipeline. This serverless solution uses AWS Lambda and Glue to transform CSV data seamlessly. 🐙🚀
Size: 3.91 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

AhmadNader319/EGP-Converter
Currency Exchange Rate ETL & Conversion Toolkit A Python toolkit for fetching, storing, and converting currency exchange rates using exchangeratesapi.io and an IBM DB2 backend. Supports real-time and historical data, automated ETL, and currency conversion for USD, EGP, and DZD. Perfect for financial apps, data pipelines, and currency analytics.
Language: Python - Size: 26.4 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

apache/streampark
Make stream processing easier! Easy-to-use streaming application development framework and operation platform.
Language: Java - Size: 59.3 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 4,091 - Forks: 1,034

Chemtor/Reddit-ETL-Pipeline
A simple ETL pipeline to extract post and comment data from Reddit
Language: Python - Size: 39.7 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

christianebacani/Roadmap
This repository serves as a temporary portfolio showcasing SQL projects, Python Scripts related to Data Engineering, highlighting key accomplishments and implementations.
Language: Python - Size: 971 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

Allan-Cao/pygrid
Python client for the GRID Esports API and ETL helper functions for game data processing
Language: Python - Size: 81.1 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

JSv4/OpenContracts
Enterprise-grade and API-first LLM workspace for unstructured documents, including data extraction, redaction, rights management, prompt playground, and more!
Language: Python - Size: 130 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 886 - Forks: 87

kalyani-ks/AWS-S3-to-Redshift-ETL-Pipeline
Automated serverless ETL pipeline built on Aws.In this project transformation of CSV data ingested into S3 is done ,using Aws Lamda to trigger Aws Glue Workflow containing Crawlers and Spark ETL jobs.The processed data is then Loaded into Redshift,with real time notification via Amazon SNS through EventBridge.
Size: 1.28 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

brageon/biwa
PR campaigns with memes.
Language: Python - Size: 2.88 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

Rahulchouhan1/sql-data-warehouse-project
Building a modern data warehouse with SQL Server, including ETL Processes, data modeling, and analytics.
Language: TSQL - Size: 4.35 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

muhammadmutahir/CreditRiskModel_CRA_using_XGBoost_Neural_Network_Random_Forest_Regression_Sourav_Basu
This repository contains a credit risk analytics project that uses logistic regression, decision trees, and various data analysis techniques. Explore the code and resources in Jupyter Notebook format to understand the model's performance and insights. 🐱💻📊
Language: Jupyter Notebook - Size: 522 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

keboola/mcp-server
Model Context Protocol (MCP) Server for the Keboola Platform
Language: Python - Size: 1.97 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 65 - Forks: 14

BenGJ10/Network-Security-System
End to End MLOPs Network Security System project with ETL Pipelines
Language: Python - Size: 1.08 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

netwerk-digitaal-erfgoed/ld-workbench
A CLI tool for transforming large RDF datasets using pure SPARQL
Language: TypeScript - Size: 1.38 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 7 - Forks: 1

logan-taggart/Spotify-Artists-Spark-ETL
ETL with Spark for finding most consistant popular artists
Language: Python - Size: 15 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

logan-taggart/Movies-Spark-ETL
ETL with Spark for finding high rated movies
Language: Python - Size: 974 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

moemen-2003/banned-books-pipeline
This repository contains an ETL pipeline for scraping banned books data from the PEN America website. It features data cleaning, transformation, and visualization using Python, Pandas, and Streamlit. 🐙📦
Language: Python - Size: 609 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

EngIbrahim1/Data-Warehousing-and-Advanced-Data-Analytics
Data Analytics Project: Analyzed Promotions and Provided Tangible Insights to Sales Director
Language: TSQL - Size: 2.07 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

techascent/tech.ml.dataset
A Clojure high performance data processing system
Language: Clojure - Size: 9.59 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 706 - Forks: 34

Sabal999/end-to-end-data-pipeline-acs
This repository showcases a robust end-to-end data pipeline for the American Community Survey dataset, utilizing tools like Python, SparkSQL, and Docker. 🚀 Explore the architecture that transforms raw data into valuable insights through a Bronze / Silver / Gold framework. 🐙
Language: Python - Size: 1.17 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

Akshay1010567/tp_final_pulseras_inteligentes
Trabajo práctico final de la materia "Base de Datos" de la Licenciatura en Ciencia de Datos (UNSAM). 1C-2025
Language: Python - Size: 43.9 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

chrisliatas/dsnd-ml-pipeline
ML pipeline to categorize emergency messages based on the needs communicated by the sender.
Language: Jupyter Notebook - Size: 2.98 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 2 - Forks: 0

caesarmario/weather-data-engineering-pipeline
This repository showcases a complete Python-based ETL (Extract, Transform, Load) data pipeline designed to process, validate, and analyze weather data for multiple cities. The project demonstrates a structured approach to handling weather data, focusing on data accuracy, transformation, and insights generation.
Language: Python - Size: 1.74 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

chriskenndy/AI-Job-Risk-Pipeline-Project
An end-to-end data pipeline and dashboard assessing the potential risk of AI replacement in various job roles and industries.
Language: Jupyter Notebook - Size: 3.27 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

nxoti/cnpj-data-pipeline
# 🇧🇷 CNPJ Data PipelineUm script modular e configurável para processar arquivos CNPJ da Receita Federal do Brasil. 🐙 Este projeto oferece suporte a múltiplos bancos de dados e permite o processamento inteligente de mais de 50 milhões de empresas.
Language: Python - Size: 384 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

fran-cornachione/Spotify-ETL-Power-Bi-Dashboard
An ETL project that extracts information from the Spotify API using Python, processes and cleans the data, and visualizes key insights from a Spotify playlist through an interactive dashboard in Power Bi
Language: Jupyter Notebook - Size: 206 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

stellar/stellar-etl
Stellar ETL will enable real-time analytics on the Stellar network
Language: Go - Size: 24.3 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 35 - Forks: 15

uw-it-aca/canvas-analytics
ETL workflow for extracting analytics from Canvas
Language: Python - Size: 1.02 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

nshaibu/Pointy-lang
This repo defines the specs for the Pointy-lang
Size: 32.2 KB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

HangOn6/CreditRiskModel_CRA_using_XGBoost_Neural_Network_Random_Forest_Regression_Sourav_Basu
Improving credit risk model using Machine learning techniques. We use a host of ml models and neural network to solve the issue.
Language: Jupyter Notebook - Size: 533 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

HangOn6/data-storage-project
This project is a comprehensive proof-of-concept (PoC) for designing and implementing a data warehouse using a real-world Product Sales and Returns dataset. It demonstrates dimensional modeling, SQL-based ETL, data normalization, Tableau visualization, and a performance comparison between relational databases (SQL) and graph databases (Neo4j).
Language: TSQL - Size: 24.5 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

Edwardvaneechoud/Flowfile
Flowfile is a visual ETL tool and Python library combining drag-and-drop workflows with Polars dataframes. Build data pipelines visually, define flows programmatically with a Polars-like API, and export to standalone Python code. Perfect for fast, intuitive data processing from development to production.
Language: Python - Size: 23.8 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 53 - Forks: 2

lanafrenzel/aws-etl-pipeline
ETL pipeline on AWS with Lambda, Glue, and S3 for data ingestion and processing - in progress
Language: Python - Size: 82 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

Zipstack/unstract-sdk
A framework for writing Unstract Tools/Apps
Language: Python - Size: 3.64 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 20 - Forks: 1

aws-samples/aws-data-pipelines-for-azure-storage
Copy data from Azure Blob Storage to Amazon S3 using code. View Azure costs using Amazon QuickSight
Language: HCL - Size: 9.52 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 14 - Forks: 6

apache/hamilton
Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Language: Jupyter Notebook - Size: 98.6 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 2,160 - Forks: 149

mseijse01/finance-integration
Flask app for financial data analysis with intelligent ETL pipelines, multi-source fallbacks, and real-time visualizations. Tracks coffee/beverage stocks with sentiment analysis, earnings data, and interactive Plotly charts. PostgreSQL backend.
Language: Python - Size: 174 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

harehimself/duxsoup-etl
ETL system utilizing the DuxSoup API for programmatic LinkedIn extraction. The project is a data extraction pipeline that automatically retrieves extensive LinkedIn profile data from first-degree connections for network analysis and relationship intelligence applications.
Language: JavaScript - Size: 355 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 1 - Forks: 0

gloryodeyemi/SQL-Data-Warehouse
A comprehensive SQL Data Warehouse built from scratch using Azure Data Studio and SQL Server Express. It simulates an enterprise data pipeline using the Medallion Architecture and reflects industry best practices in Data Engineering, ETL design, and SQL-based data modeling.
Language: TSQL - Size: 12.4 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

jitsucom/bulker
Service for bulk-loading data to databases with automatic schema management (Redshift, Snowflake, BigQuery, ClickHouse, Postgres, MySQL)
Language: Go - Size: 5.78 MB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 178 - Forks: 30

abhishekk-16/aws-etl-pipeline
A automated serverless ETL pipeline built on AWS. In this project transformation of CSV data ingested into S3 is done , using AWS Lambda to trigger a AWS Glue Workflow containing crawlers and Spark ETL jobs. The processed data is then loaded into Amazon Redshift, with real time notifications delivered via Amazon SNS through EventBridge.
Size: 3.91 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

fmelihh/recommendation-engine
Generative AI & Recommendation Engine --- Firat University / Faculty of Technology / Software Engineering / Final Project
Language: Python - Size: 3.69 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 6 - Forks: 0

ebonnal/streamable
concurrent & fluent interface for (async) iterables
Language: Python - Size: 4.08 MB - Last synced at: 3 days ago - Pushed at: 11 days ago - Stars: 271 - Forks: 4

PaBHavik2002/Kaggle-Projects
This is a kaggle project repository and all the project that I have done so far will be in this repo. I will keep uploading different project twice or thrice a month.
Size: 1.1 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 1 - Forks: 0

ArneDePeuter/deppy
A Python dependency executor that builds and executes DAGs efficiently, optimizing workflows with concurrency and flexibility. Perfect for managing complex dependent tasks effortlessly.
Language: Python - Size: 372 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

RLado/Canonada
Canonada is a data science framework that helps you build production-ready streaming pipelines for data processing in Python
Language: Python - Size: 9.93 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 1

WhiskeyTangoFoxtro/Data-Warehousing-and-BI
End-to-end Data Warehousing and Business Intelligence project using SQL, SSMS, Power BI, and Medallion Architecture. Features ETL workflows, dimensional modeling, and interactive dashboards for business insights.
Language: TSQL - Size: 2.33 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

danhphan/trusted-data-pipeline
Building 3D Trusted Data Pipelines With Dagster, Dbt, and Duckdb
Language: Python - Size: 5.25 MB - Last synced at: 3 days ago - Pushed at: almost 2 years ago - Stars: 21 - Forks: 1

aaronlmathis/GoETL
A high-performance Go library for building complete ETL (Extract, Transform, Load) directed-acyclic graph pipelines. GoETL provides streaming data readers, configurable transformations, and efficient writers with a fluent API for complex data workflows.
Language: Go - Size: 26.1 MB - Last synced at: 3 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

LostMa-ERC/heurist-api
API wrapper and CLI for exporting data from a Heurist database server.
Language: Python - Size: 10.4 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 1 - Forks: 0

simon-bronnikov/ETL-Airflow-Hive-Spark-Postres-Docker-
Этот проект реализует процесс извлечения, трансформации и загрузки (ETL).
Language: Shell - Size: 1.13 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

tnrjr/BI-Amazon-Latam
BI solution for Latin American retail using Brazilian e-commerce data, with DW, ETL, analysis, and dashboards.
Language: Jupyter Notebook - Size: 62.7 MB - Last synced at: about 13 hours ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

Sinekhaya/ETL-Project
A Python-based ETL pipeline notebook demonstrating how to extract, transform, and load data using pandas and SQLite.
Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

FNevs/uni-tees
Repository of practical tutorials on Data Pipelines, APIs (FastAPI, Flask), NLP (NLTK), and AI with LangChain.
Language: Jupyter Notebook - Size: 2.99 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

wri/gfw-data-api
GFW Data API
Language: Python - Size: 5.61 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 14 - Forks: 5

Matrix030/SteamLens
High-performance sentiment analysis platform for Steam reviews. Built with Python, Dask & transformers to process millions of reviews in minutes. Features AI topic assignment, sentiment separation by themes, GPU acceleration, and Streamlit web interface for game developers and data scientists.
Language: Jupyter Notebook - Size: 223 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

datagucc/ETL_JSON-SQL-Python-Airflow-Docker
Full ETL pipeline with Airflow, Docker & PostgreSQL — end-to-end orchestration to extract, transform, and load data in a local production-grade environment.
Language: Python - Size: 3 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

chubbard/gratum
A simplified ETL engine for groovy.
Language: Groovy - Size: 1.06 MB - Last synced at: 15 days ago - Pushed at: 16 days ago - Stars: 1 - Forks: 0

lnyemba/data-transport
read/write data anywhere, Lightweight ETL tool
Language: Python - Size: 257 KB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 3 - Forks: 0

Nero103/airbnb-destination
This is and end-to-end project to uncover the ideal destination based on listings and hosts. Strategy included: Data workflow-SQL analysis-Data modeling-Data Visualization-Findings
Size: 811 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

letsiki/end-to-end-data-pipeline-acs
End-to-end data pipeline for the ACS dataset using Python, PySpark, PostgreSQL, and Kubernetes (Bronze / Silver / Gold architecture).
Language: Python - Size: 1.17 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

conductor-sdk/conductor-python
Conductor OSS SDK for Python programming language
Language: Python - Size: 1.74 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 75 - Forks: 35

logleads/LogverzPortal
LOGVERZ PORTAL. Logverz Portal is the "web interface" component of the Logverz application bundle (LogverzReleases Repository). Logverz is a serverless adaptive data pipeline, the fastest route from AWS S3 to instant reports.
Language: Vue - Size: 7.69 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 2 - Forks: 0

logleads/LogverzReleases
⚡Get insights 10x faster. 🔐Ensure all data stays within your secure AWS environment—no external transfers. 🔗Power AI applications and agents 🤖 or optimise traditional workflows 📊 seamlessly. 💡Have an all-in-one solution. 📈 Work seamlessly with Power BI, Tableau, and more. 💰 Cut costs by 90%.
Language: PowerShell - Size: 6.93 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 12 - Forks: 1

logleads/LogverzCore
LOGVERZ CORE. Logverz Core is the "backend" component of the Logverz application bundle (LogverzReleases Repository). Logverz is a serverless adaptive data pipeline, the fastest route from AWS S3 to instant reports.
Language: JavaScript - Size: 6.33 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 4 - Forks: 1

BenuelOmanga/Stock-Market-ETL
Real-time stock market analytics pipeline using Python, Prefect 3.4.4, and Power BI. It includes ETL automation, ML price prediction, anomaly alerts via email, a dynamic Dashboard and Streamlit app
Language: Jupyter Notebook - Size: 1.42 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

HazemBZ/pdf-fuzz
PoC bulk search you pdf files using text look up
Size: 8.79 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

data-solution-automation-engine/DIRECT
DIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics control framework that can be used to monitor, log, audit and control data integration / ETL processes.
Language: TSQL - Size: 14.8 MB - Last synced at: 6 days ago - Pushed at: 19 days ago - Stars: 27 - Forks: 9

GOPAD-Datasus/ETL-SINASC
Pipeline capable to handle messy SINASC files and convert to a more user friendly format
Language: Python - Size: 83 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

CoreBlader/autobiz-api-extractor
# Autobiz API Extractor## DescriptionThis project extracts data from the [Autobiz API](https://corporate.autobiz.com/es/nuestros-productos/autobizapi/), storing it in JSON or CSV files, and analyzes the results. It features a modular structure for easy data extraction, processing, and visualization. 🐙📊
Language: Python - Size: 14.6 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

AbdulRafay365/End-to-End-Data-Engineering-Pipeline-in-Azure-and-Power-BI
An end-to-end data pipeline project using Azure to extract, transform, and visualize customer sales data using an HTTP Linked Service in Azure Data Factory. Delivers an interactive Power BI dashboard with product and sales insights.
Language: Jupyter Notebook - Size: 3.21 MB - Last synced at: 19 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

VigneshKanna18/foodhunter-revenue-drop-analysis
A BI solution developed for FoodHunter to investigate a significant drop in revenue over a four month period. This analysis helps uncover actionable insights through data exploration, visualization and hypothesis-driven analysis to support informed decision-making.
Language: Jupyter Notebook - Size: 9.74 MB - Last synced at: 1 day ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

k178412/sql-data-warehouse-project
Building a data warehouse with SQL Server including ETL processes, data modeling and data analytics.
Language: TSQL - Size: 1.58 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

kelvinleandro/ida-data-engineering
Language: Python - Size: 375 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

Ayushman0511/Data-Warehouse-Project1
A comprehensive guide to building a data warehouse with SQL Server, including ETL processes, data modeling, and analytics.
Language: TSQL - Size: 3.29 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

jvalue/jayvee
Jayvee is a domain-specific language and runtime for automated processing of data pipelines
Language: TypeScript - Size: 11.4 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 176 - Forks: 15

lanafrenzel/gmail-bigquery-etl
End-to-end project for extracting Gmail metadata using user tokens and loading it into BigQuery. Includes a CLI token uploader and a scalable ETL service.
Language: Python - Size: 18.6 KB - Last synced at: 21 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

zainea-bogdan/Data_Engineer_Project_WoWCinema
WoWCinema is a project based on a fictional scenario where I stepped into the role of a Data Engineer, designing and building an end-to-end Data Infrastructure. A ETL pipeline ingests data from multiple sources, transforms it, and loads it into a centralized PostgreSQL data warehouse to power analytics, KPI tracking, and reporting
Language: Python - Size: 2.14 MB - Last synced at: 16 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

ronaldkanyepi/Log-Realtime-Analysis
A scalable architecture for real-time log processing and visualization. Built with a Kafka-Spark ETL pipeline, DynamoDB for storing aggregate real-time metrics, and Python Dash for interactive dashboards. Designed for high-throughput log ingestion, real-time monitoring, and long-term storage.
Language: Python - Size: 1.14 MB - Last synced at: 19 days ago - Pushed at: 6 months ago - Stars: 4 - Forks: 0

msaakaash/hospital-data-warehouse
A data warehouse project designed to demonstrate SQL and data modeling skills.
Language: SQL - Size: 1.04 MB - Last synced at: 22 days ago - Pushed at: 23 days ago - Stars: 0 - Forks: 0

NirgalFromMars/disaster-tweets-classification
End-to-end data analysis project using RDS (AWS) data sources (containing csv data), ETL/EDA + Deep Learning models in Jupyter Notebooks, and Tableau visualizations & dashboard
Language: Jupyter Notebook - Size: 989 KB - Last synced at: 22 days ago - Pushed at: 23 days ago - Stars: 0 - Forks: 0

nasrmohammad4804/search-engine-concept
this repo for learning search engine such as elk and web search engine concept such as google to grow knowledge of software engineering
Language: Java - Size: 13.7 MB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 9 - Forks: 2

castengine/insert-tools
CLI tool for inserting SELECT query results into ClickHouse with automatic schema matching and type-safe casting. Ideal for ETL pipelines and SQL-driven data flows.
Language: Python - Size: 47.9 KB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 8 - Forks: 0

Gerardo1909/tp_final_pulseras_inteligentes
Trabajo práctico final de la materia "Base de Datos" de la Licenciatura en Ciencia de Datos (UNSAM). 1C-2025
Language: Python - Size: 503 KB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 0 - Forks: 0

logleads/LogverzPortalAccess
LOGVERZ PORTAL ACCESS. Logverz portal access is the "login" component of the Logverz application bundle (LogverzReleases Repository). Logverz is a serverless adaptive data pipeline, the fastest route from AWS S3 to instant reports.
Language: Vue - Size: 8.81 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 2 - Forks: 0

harehimself/linkedin-etl
ETL system utilizing the DuxSoup API for programmatic LinkedIn extraction. The project is a data extraction pipeline that automatically retrieves extensive LinkedIn profile data from first-degree connections for network analysis and relationship intelligence applications.
Language: JavaScript - Size: 391 KB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 1 - Forks: 0

ivanildobarauna-dev/data-pipeline-sync-ingest
ETL Process for Currency Quotes Data" project is a complete solution dedicated to extracting, transforming and loading (ETL) currency quote data. This project uses several advanced techniques and architectures to ensure the efficiency and robustness of the ETL process.
Language: Python - Size: 6.85 MB - Last synced at: 7 days ago - Pushed at: 4 months ago - Stars: 2 - Forks: 1

YotpoLtd/metorikku
A simplified, lightweight ETL Framework based on Apache Spark
Language: Scala - Size: 4.2 MB - Last synced at: 18 days ago - Pushed at: over 1 year ago - Stars: 586 - Forks: 160

priyanshubiswas-tech/AWS-ETL-Pipeline-on-Cloud-using-Glue-Athena-Lambda-and-Redshift
Serverless ETL pipeline on AWS using Glue, Lambda, Athena, and Redshift — automates data ingestion, transformation, and analytics with scalable, event-driven architecture.
Language: Python - Size: 20.5 KB - Last synced at: 18 days ago - Pushed at: 26 days ago - Stars: 1 - Forks: 0
