An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: etl-pipeline

Andy-Aranda/spotify-pipeline

Extract, analyze, and visualize your most-streamed songs on Spotify using Python, PostgreSQL, and Power BI. ETL project + interactive dashboard.

Language: Python - Size: 35.6 MB - Last synced at: about 8 hours ago - Pushed at: about 8 hours ago - Stars: 0 - Forks: 0

macielmk7/IMDB-Movie-Analysis

Analyze IMDB movie data with Python and pandas. Discover trends in ratings and genre popularity over time. 📊📈 Explore insights with ease.

Language: Jupyter Notebook - Size: 1.87 MB - Last synced at: about 11 hours ago - Pushed at: about 12 hours ago - Stars: 0 - Forks: 0

TriplyDB/Documentation

Documentation for the TriplyDB and TriplyETL products

Language: HTML - Size: 14.5 MB - Last synced at: about 19 hours ago - Pushed at: about 20 hours ago - Stars: 9 - Forks: 5

Zipstack/unstract

No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents

Language: Python - Size: 32.9 MB - Last synced at: about 23 hours ago - Pushed at: about 24 hours ago - Stars: 5,388 - Forks: 506

JoseYahirHernandezCasanova/DIO_Santander_DataScience

Repository documenting my journey through the Santander Bootcamp 2023 - Data Science with Python. Contains projects, exercises, and materials covering Python, data visualization, machine learning models, and ETL pipelines developed during this comprehensive DIO educational program.

Size: 1000 Bytes - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

Hippaho/Sparkify

A music streaming company, Sparkify, has decided that it is time to introduce more automation and monitoring to their data warehouse ETL pipelines and come to the conclusion that the best tool to achieve this is Apache Airflow.

Language: Python - Size: 17.6 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

stellar/stellar-etl-airflow

Airflow DAGs for the Stellar ETL project

Language: Python - Size: 3.45 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 38 - Forks: 19

prosowiec/ETLsec

Containerized ETL pipeline orchestrated with Apache Airflow for extracting, transforming, and loading financial data from SEC EDGAR, earnings transcripts, and stock prices into a PostgreSQL warehouse.

Language: Python - Size: 7.15 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

MobileTeleSystems/onetl

One ETL tool to rule them all

Language: Python - Size: 8.79 MB - Last synced at: about 20 hours ago - Pushed at: about 21 hours ago - Stars: 80 - Forks: 6

yaoguangluo/ChromosomeDNA

《DNA元基催化与肽计算》 在进化计算中, 软件函数文件进行 DNA 语义元基索引编码的 PDE 新陈代谢优化方式, 是一种有效的进化方式.

Language: Java - Size: 678 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 7 - Forks: 2

s-yazhini/Hexa-DE-Main-Project

Data engineering main project 1

Language: Jupyter Notebook - Size: 15.5 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

chiragjaiswar/AWS-S3-to-Redshift-ETL-Pipeline

Automate your data flow with the AWS S3 to Redshift ETL Pipeline. This serverless solution uses AWS Lambda and Glue to transform CSV data seamlessly. 🐙🚀

Size: 3.91 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

AhmadNader319/EGP-Converter

Currency Exchange Rate ETL & Conversion Toolkit A Python toolkit for fetching, storing, and converting currency exchange rates using exchangeratesapi.io and an IBM DB2 backend. Supports real-time and historical data, automated ETL, and currency conversion for USD, EGP, and DZD. Perfect for financial apps, data pipelines, and currency analytics.

Language: Python - Size: 26.4 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

apache/streampark

Make stream processing easier! Easy-to-use streaming application development framework and operation platform.

Language: Java - Size: 59.3 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 4,091 - Forks: 1,034

Chemtor/Reddit-ETL-Pipeline

A simple ETL pipeline to extract post and comment data from Reddit

Language: Python - Size: 39.7 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

christianebacani/Roadmap

This repository serves as a temporary portfolio showcasing SQL projects, Python Scripts related to Data Engineering, highlighting key accomplishments and implementations.

Language: Python - Size: 971 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

Allan-Cao/pygrid

Python client for the GRID Esports API and ETL helper functions for game data processing

Language: Python - Size: 81.1 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

JSv4/OpenContracts

Enterprise-grade and API-first LLM workspace for unstructured documents, including data extraction, redaction, rights management, prompt playground, and more!

Language: Python - Size: 130 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 886 - Forks: 87

kalyani-ks/AWS-S3-to-Redshift-ETL-Pipeline

Automated serverless ETL pipeline built on Aws.In this project transformation of CSV data ingested into S3 is done ,using Aws Lamda to trigger Aws Glue Workflow containing Crawlers and Spark ETL jobs.The processed data is then Loaded into Redshift,with real time notification via Amazon SNS through EventBridge.

Size: 1.28 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

brageon/biwa

PR campaigns with memes.

Language: Python - Size: 2.88 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

Rahulchouhan1/sql-data-warehouse-project

Building a modern data warehouse with SQL Server, including ETL Processes, data modeling, and analytics.

Language: TSQL - Size: 4.35 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

muhammadmutahir/CreditRiskModel_CRA_using_XGBoost_Neural_Network_Random_Forest_Regression_Sourav_Basu

This repository contains a credit risk analytics project that uses logistic regression, decision trees, and various data analysis techniques. Explore the code and resources in Jupyter Notebook format to understand the model's performance and insights. 🐱💻📊

Language: Jupyter Notebook - Size: 522 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

keboola/mcp-server

Model Context Protocol (MCP) Server for the Keboola Platform

Language: Python - Size: 1.97 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 65 - Forks: 14

BenGJ10/Network-Security-System

End to End MLOPs Network Security System project with ETL Pipelines

Language: Python - Size: 1.08 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

netwerk-digitaal-erfgoed/ld-workbench

A CLI tool for transforming large RDF datasets using pure SPARQL

Language: TypeScript - Size: 1.38 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 7 - Forks: 1

logan-taggart/Spotify-Artists-Spark-ETL

ETL with Spark for finding most consistant popular artists

Language: Python - Size: 15 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

logan-taggart/Movies-Spark-ETL

ETL with Spark for finding high rated movies

Language: Python - Size: 974 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

moemen-2003/banned-books-pipeline

This repository contains an ETL pipeline for scraping banned books data from the PEN America website. It features data cleaning, transformation, and visualization using Python, Pandas, and Streamlit. 🐙📦

Language: Python - Size: 609 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

EngIbrahim1/Data-Warehousing-and-Advanced-Data-Analytics

Data Analytics Project: Analyzed Promotions and Provided Tangible Insights to Sales Director

Language: TSQL - Size: 2.07 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

techascent/tech.ml.dataset

A Clojure high performance data processing system

Language: Clojure - Size: 9.59 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 706 - Forks: 34

Sabal999/end-to-end-data-pipeline-acs

This repository showcases a robust end-to-end data pipeline for the American Community Survey dataset, utilizing tools like Python, SparkSQL, and Docker. 🚀 Explore the architecture that transforms raw data into valuable insights through a Bronze / Silver / Gold framework. 🐙

Language: Python - Size: 1.17 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

Akshay1010567/tp_final_pulseras_inteligentes

Trabajo práctico final de la materia "Base de Datos" de la Licenciatura en Ciencia de Datos (UNSAM). 1C-2025

Language: Python - Size: 43.9 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

chrisliatas/dsnd-ml-pipeline

ML pipeline to categorize emergency messages based on the needs communicated by the sender.

Language: Jupyter Notebook - Size: 2.98 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 2 - Forks: 0

caesarmario/weather-data-engineering-pipeline

This repository showcases a complete Python-based ETL (Extract, Transform, Load) data pipeline designed to process, validate, and analyze weather data for multiple cities. The project demonstrates a structured approach to handling weather data, focusing on data accuracy, transformation, and insights generation.

Language: Python - Size: 1.74 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

chriskenndy/AI-Job-Risk-Pipeline-Project

An end-to-end data pipeline and dashboard assessing the potential risk of AI replacement in various job roles and industries.

Language: Jupyter Notebook - Size: 3.27 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

nxoti/cnpj-data-pipeline

# 🇧🇷 CNPJ Data PipelineUm script modular e configurável para processar arquivos CNPJ da Receita Federal do Brasil. 🐙 Este projeto oferece suporte a múltiplos bancos de dados e permite o processamento inteligente de mais de 50 milhões de empresas.

Language: Python - Size: 384 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

fran-cornachione/Spotify-ETL-Power-Bi-Dashboard

An ETL project that extracts information from the Spotify API using Python, processes and cleans the data, and visualizes key insights from a Spotify playlist through an interactive dashboard in Power Bi

Language: Jupyter Notebook - Size: 206 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

stellar/stellar-etl

Stellar ETL will enable real-time analytics on the Stellar network

Language: Go - Size: 24.3 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 35 - Forks: 15

uw-it-aca/canvas-analytics

ETL workflow for extracting analytics from Canvas

Language: Python - Size: 1.02 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

nshaibu/Pointy-lang

This repo defines the specs for the Pointy-lang

Size: 32.2 KB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

HangOn6/CreditRiskModel_CRA_using_XGBoost_Neural_Network_Random_Forest_Regression_Sourav_Basu

Improving credit risk model using Machine learning techniques. We use a host of ml models and neural network to solve the issue.

Language: Jupyter Notebook - Size: 533 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

HangOn6/data-storage-project

This project is a comprehensive proof-of-concept (PoC) for designing and implementing a data warehouse using a real-world Product Sales and Returns dataset. It demonstrates dimensional modeling, SQL-based ETL, data normalization, Tableau visualization, and a performance comparison between relational databases (SQL) and graph databases (Neo4j).

Language: TSQL - Size: 24.5 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

Edwardvaneechoud/Flowfile

Flowfile is a visual ETL tool and Python library combining drag-and-drop workflows with Polars dataframes. Build data pipelines visually, define flows programmatically with a Polars-like API, and export to standalone Python code. Perfect for fast, intuitive data processing from development to production.

Language: Python - Size: 23.8 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 53 - Forks: 2

lanafrenzel/aws-etl-pipeline

ETL pipeline on AWS with Lambda, Glue, and S3 for data ingestion and processing - in progress

Language: Python - Size: 82 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

Zipstack/unstract-sdk

A framework for writing Unstract Tools/Apps

Language: Python - Size: 3.64 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 20 - Forks: 1

aws-samples/aws-data-pipelines-for-azure-storage

Copy data from Azure Blob Storage to Amazon S3 using code. View Azure costs using Amazon QuickSight

Language: HCL - Size: 9.52 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 14 - Forks: 6

apache/hamilton

Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

Language: Jupyter Notebook - Size: 98.6 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 2,160 - Forks: 149

mseijse01/finance-integration

Flask app for financial data analysis with intelligent ETL pipelines, multi-source fallbacks, and real-time visualizations. Tracks coffee/beverage stocks with sentiment analysis, earnings data, and interactive Plotly charts. PostgreSQL backend.

Language: Python - Size: 174 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

harehimself/duxsoup-etl

ETL system utilizing the DuxSoup API for programmatic LinkedIn extraction. The project is a data extraction pipeline that automatically retrieves extensive LinkedIn profile data from first-degree connections for network analysis and relationship intelligence applications.

Language: JavaScript - Size: 355 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 1 - Forks: 0

gloryodeyemi/SQL-Data-Warehouse

A comprehensive SQL Data Warehouse built from scratch using Azure Data Studio and SQL Server Express. It simulates an enterprise data pipeline using the Medallion Architecture and reflects industry best practices in Data Engineering, ETL design, and SQL-based data modeling.

Language: TSQL - Size: 12.4 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

jitsucom/bulker

Service for bulk-loading data to databases with automatic schema management (Redshift, Snowflake, BigQuery, ClickHouse, Postgres, MySQL)

Language: Go - Size: 5.78 MB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 178 - Forks: 30

abhishekk-16/aws-etl-pipeline

A automated serverless ETL pipeline built on AWS. In this project transformation of CSV data ingested into S3 is done , using AWS Lambda to trigger a AWS Glue Workflow containing crawlers and Spark ETL jobs. The processed data is then loaded into Amazon Redshift, with real time notifications delivered via Amazon SNS through EventBridge.

Size: 3.91 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

fmelihh/recommendation-engine

Generative AI & Recommendation Engine --- Firat University / Faculty of Technology / Software Engineering / Final Project

Language: Python - Size: 3.69 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 6 - Forks: 0

ebonnal/streamable

concurrent & fluent interface for (async) iterables

Language: Python - Size: 4.08 MB - Last synced at: 3 days ago - Pushed at: 11 days ago - Stars: 271 - Forks: 4

PaBHavik2002/Kaggle-Projects

This is a kaggle project repository and all the project that I have done so far will be in this repo. I will keep uploading different project twice or thrice a month.

Size: 1.1 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 1 - Forks: 0

ArneDePeuter/deppy

A Python dependency executor that builds and executes DAGs efficiently, optimizing workflows with concurrency and flexibility. Perfect for managing complex dependent tasks effortlessly.

Language: Python - Size: 372 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

RLado/Canonada

Canonada is a data science framework that helps you build production-ready streaming pipelines for data processing in Python

Language: Python - Size: 9.93 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 1

WhiskeyTangoFoxtro/Data-Warehousing-and-BI

End-to-end Data Warehousing and Business Intelligence project using SQL, SSMS, Power BI, and Medallion Architecture. Features ETL workflows, dimensional modeling, and interactive dashboards for business insights.

Language: TSQL - Size: 2.33 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

danhphan/trusted-data-pipeline

Building 3D Trusted Data Pipelines With Dagster, Dbt, and Duckdb

Language: Python - Size: 5.25 MB - Last synced at: 3 days ago - Pushed at: almost 2 years ago - Stars: 21 - Forks: 1

aaronlmathis/GoETL

A high-performance Go library for building complete ETL (Extract, Transform, Load) directed-acyclic graph pipelines. GoETL provides streaming data readers, configurable transformations, and efficient writers with a fluent API for complex data workflows.

Language: Go - Size: 26.1 MB - Last synced at: 3 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

LostMa-ERC/heurist-api

API wrapper and CLI for exporting data from a Heurist database server.

Language: Python - Size: 10.4 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 1 - Forks: 0

simon-bronnikov/ETL-Airflow-Hive-Spark-Postres-Docker-

Этот проект реализует процесс извлечения, трансформации и загрузки (ETL).

Language: Shell - Size: 1.13 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

tnrjr/BI-Amazon-Latam

BI solution for Latin American retail using Brazilian e-commerce data, with DW, ETL, analysis, and dashboards.

Language: Jupyter Notebook - Size: 62.7 MB - Last synced at: about 13 hours ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

Sinekhaya/ETL-Project

A Python-based ETL pipeline notebook demonstrating how to extract, transform, and load data using pandas and SQLite.

Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

FNevs/uni-tees

Repository of practical tutorials on Data Pipelines, APIs (FastAPI, Flask), NLP (NLTK), and AI with LangChain.

Language: Jupyter Notebook - Size: 2.99 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

wri/gfw-data-api

GFW Data API

Language: Python - Size: 5.61 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 14 - Forks: 5

Matrix030/SteamLens

High-performance sentiment analysis platform for Steam reviews. Built with Python, Dask & transformers to process millions of reviews in minutes. Features AI topic assignment, sentiment separation by themes, GPU acceleration, and Streamlit web interface for game developers and data scientists.

Language: Jupyter Notebook - Size: 223 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

datagucc/ETL_JSON-SQL-Python-Airflow-Docker

Full ETL pipeline with Airflow, Docker & PostgreSQL — end-to-end orchestration to extract, transform, and load data in a local production-grade environment.

Language: Python - Size: 3 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

chubbard/gratum

A simplified ETL engine for groovy.

Language: Groovy - Size: 1.06 MB - Last synced at: 15 days ago - Pushed at: 16 days ago - Stars: 1 - Forks: 0

lnyemba/data-transport

read/write data anywhere, Lightweight ETL tool

Language: Python - Size: 257 KB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 3 - Forks: 0

Nero103/airbnb-destination

This is and end-to-end project to uncover the ideal destination based on listings and hosts. Strategy included: Data workflow-SQL analysis-Data modeling-Data Visualization-Findings

Size: 811 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

letsiki/end-to-end-data-pipeline-acs

End-to-end data pipeline for the ACS dataset using Python, PySpark, PostgreSQL, and Kubernetes (Bronze / Silver / Gold architecture).

Language: Python - Size: 1.17 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

conductor-sdk/conductor-python

Conductor OSS SDK for Python programming language

Language: Python - Size: 1.74 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 75 - Forks: 35

logleads/LogverzPortal

LOGVERZ PORTAL. Logverz Portal is the "web interface" component of the Logverz application bundle (LogverzReleases Repository). Logverz is a serverless adaptive data pipeline, the fastest route from AWS S3 to instant reports.

Language: Vue - Size: 7.69 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 2 - Forks: 0

logleads/LogverzReleases

⚡Get insights 10x faster. 🔐Ensure all data stays within your secure AWS environment—no external transfers. 🔗Power AI applications and agents 🤖 or optimise traditional workflows 📊 seamlessly. 💡Have an all-in-one solution. 📈 Work seamlessly with Power BI, Tableau, and more. 💰 Cut costs by 90%.

Language: PowerShell - Size: 6.93 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 12 - Forks: 1

logleads/LogverzCore

LOGVERZ CORE. Logverz Core is the "backend" component of the Logverz application bundle (LogverzReleases Repository). Logverz is a serverless adaptive data pipeline, the fastest route from AWS S3 to instant reports.

Language: JavaScript - Size: 6.33 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 4 - Forks: 1

BenuelOmanga/Stock-Market-ETL

Real-time stock market analytics pipeline using Python, Prefect 3.4.4, and Power BI. It includes ETL automation, ML price prediction, anomaly alerts via email, a dynamic Dashboard and Streamlit app

Language: Jupyter Notebook - Size: 1.42 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

HazemBZ/pdf-fuzz

PoC bulk search you pdf files using text look up

Size: 8.79 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

data-solution-automation-engine/DIRECT

DIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics control framework that can be used to monitor, log, audit and control data integration / ETL processes.

Language: TSQL - Size: 14.8 MB - Last synced at: 6 days ago - Pushed at: 19 days ago - Stars: 27 - Forks: 9

GOPAD-Datasus/ETL-SINASC

Pipeline capable to handle messy SINASC files and convert to a more user friendly format

Language: Python - Size: 83 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

CoreBlader/autobiz-api-extractor

# Autobiz API Extractor## DescriptionThis project extracts data from the [Autobiz API](https://corporate.autobiz.com/es/nuestros-productos/autobizapi/), storing it in JSON or CSV files, and analyzes the results. It features a modular structure for easy data extraction, processing, and visualization. 🐙📊

Language: Python - Size: 14.6 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

AbdulRafay365/End-to-End-Data-Engineering-Pipeline-in-Azure-and-Power-BI

An end-to-end data pipeline project using Azure to extract, transform, and visualize customer sales data using an HTTP Linked Service in Azure Data Factory. Delivers an interactive Power BI dashboard with product and sales insights.

Language: Jupyter Notebook - Size: 3.21 MB - Last synced at: 19 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

VigneshKanna18/foodhunter-revenue-drop-analysis

A BI solution developed for FoodHunter to investigate a significant drop in revenue over a four month period. This analysis helps uncover actionable insights through data exploration, visualization and hypothesis-driven analysis to support informed decision-making.

Language: Jupyter Notebook - Size: 9.74 MB - Last synced at: 1 day ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

k178412/sql-data-warehouse-project

Building a data warehouse with SQL Server including ETL processes, data modeling and data analytics.

Language: TSQL - Size: 1.58 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

kelvinleandro/ida-data-engineering

Language: Python - Size: 375 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

Ayushman0511/Data-Warehouse-Project1

A comprehensive guide to building a data warehouse with SQL Server, including ETL processes, data modeling, and analytics.

Language: TSQL - Size: 3.29 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

jvalue/jayvee

Jayvee is a domain-specific language and runtime for automated processing of data pipelines

Language: TypeScript - Size: 11.4 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 176 - Forks: 15

lanafrenzel/gmail-bigquery-etl

End-to-end project for extracting Gmail metadata using user tokens and loading it into BigQuery. Includes a CLI token uploader and a scalable ETL service.

Language: Python - Size: 18.6 KB - Last synced at: 21 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

zainea-bogdan/Data_Engineer_Project_WoWCinema

WoWCinema is a project based on a fictional scenario where I stepped into the role of a Data Engineer, designing and building an end-to-end Data Infrastructure. A ETL pipeline ingests data from multiple sources, transforms it, and loads it into a centralized PostgreSQL data warehouse to power analytics, KPI tracking, and reporting

Language: Python - Size: 2.14 MB - Last synced at: 16 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

ronaldkanyepi/Log-Realtime-Analysis

A scalable architecture for real-time log processing and visualization. Built with a Kafka-Spark ETL pipeline, DynamoDB for storing aggregate real-time metrics, and Python Dash for interactive dashboards. Designed for high-throughput log ingestion, real-time monitoring, and long-term storage.

Language: Python - Size: 1.14 MB - Last synced at: 19 days ago - Pushed at: 6 months ago - Stars: 4 - Forks: 0

msaakaash/hospital-data-warehouse

A data warehouse project designed to demonstrate SQL and data modeling skills.

Language: SQL - Size: 1.04 MB - Last synced at: 22 days ago - Pushed at: 23 days ago - Stars: 0 - Forks: 0

NirgalFromMars/disaster-tweets-classification

End-to-end data analysis project using RDS (AWS) data sources (containing csv data), ETL/EDA + Deep Learning models in Jupyter Notebooks, and Tableau visualizations & dashboard

Language: Jupyter Notebook - Size: 989 KB - Last synced at: 22 days ago - Pushed at: 23 days ago - Stars: 0 - Forks: 0

nasrmohammad4804/search-engine-concept

this repo for learning search engine such as elk and web search engine concept such as google to grow knowledge of software engineering

Language: Java - Size: 13.7 MB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 9 - Forks: 2

castengine/insert-tools

CLI tool for inserting SELECT query results into ClickHouse with automatic schema matching and type-safe casting. Ideal for ETL pipelines and SQL-driven data flows.

Language: Python - Size: 47.9 KB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 8 - Forks: 0

Gerardo1909/tp_final_pulseras_inteligentes

Trabajo práctico final de la materia "Base de Datos" de la Licenciatura en Ciencia de Datos (UNSAM). 1C-2025

Language: Python - Size: 503 KB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 0 - Forks: 0

logleads/LogverzPortalAccess

LOGVERZ PORTAL ACCESS. Logverz portal access is the "login" component of the Logverz application bundle (LogverzReleases Repository). Logverz is a serverless adaptive data pipeline, the fastest route from AWS S3 to instant reports.

Language: Vue - Size: 8.81 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 2 - Forks: 0

harehimself/linkedin-etl

ETL system utilizing the DuxSoup API for programmatic LinkedIn extraction. The project is a data extraction pipeline that automatically retrieves extensive LinkedIn profile data from first-degree connections for network analysis and relationship intelligence applications.

Language: JavaScript - Size: 391 KB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 1 - Forks: 0

ivanildobarauna-dev/data-pipeline-sync-ingest

ETL Process for Currency Quotes Data" project is a complete solution dedicated to extracting, transforming and loading (ETL) currency quote data. This project uses several advanced techniques and architectures to ensure the efficiency and robustness of the ETL process.

Language: Python - Size: 6.85 MB - Last synced at: 7 days ago - Pushed at: 4 months ago - Stars: 2 - Forks: 1

YotpoLtd/metorikku

A simplified, lightweight ETL Framework based on Apache Spark

Language: Scala - Size: 4.2 MB - Last synced at: 18 days ago - Pushed at: over 1 year ago - Stars: 586 - Forks: 160

priyanshubiswas-tech/AWS-ETL-Pipeline-on-Cloud-using-Glue-Athena-Lambda-and-Redshift

Serverless ETL pipeline on AWS using Glue, Lambda, Athena, and Redshift — automates data ingestion, transformation, and analytics with scalable, event-driven architecture.

Language: Python - Size: 20.5 KB - Last synced at: 18 days ago - Pushed at: 26 days ago - Stars: 1 - Forks: 0