Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: extract-transform-load

marda-alliance/metadata_extractors_registry

A place to develop and discuss the MaRDA Extractors WG registry.

Language: Python - Size: 197 KB - Last synced: 29 days ago - Pushed: 29 days ago - Stars: 6 - Forks: 6

success4lyf/ETL-Pipline

Building a fully scalable ETL (Extract, Transform, Load) pipeline to handle large volumes of transaction data for a café business.

Language: Jupyter Notebook - Size: 70.3 KB - Last synced: 7 days ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

python-bonobo/bonobo

Extract Transform Load for Python 3.5+

Language: Python - Size: 1.46 MB - Last synced: 13 days ago - Pushed: about 1 year ago - Stars: 1,575 - Forks: 142

chayansraj/Data-Pipeline-with-dbt-using-Airflow-on-GCP

This project demonstrates how to build and automate an ETL pipeline using DAGs in Airflow and load the transformed data to Bigquery. There are different tools that have been used in this project such as Astro, DBT, GCP, Airflow, Metabase.

Language: Python - Size: 15.1 MB - Last synced: 17 days ago - Pushed: 17 days ago - Stars: 17 - Forks: 2

mathewsrc/machine-learning-monitoring-with-evidently

ML Monitoring with EvidentlyAI

Language: Jupyter Notebook - Size: 23.1 MB - Last synced: 21 days ago - Pushed: 5 months ago - Stars: 0 - Forks: 0

networktocode/diffsync

A utility library for comparing and synchronizing different datasets.

Language: Python - Size: 1.14 MB - Last synced: 24 days ago - Pushed: 24 days ago - Stars: 135 - Forks: 26

immanuvelprathap/ETL-Sales_Analysis_Report---MySQL-PowerBI

This repo explains how ETL can be done in MySQL and PowerBi to generate insights!

Size: 6.75 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1 - Forks: 0

Abhi0323/Full-Cycle-ETL-Analytics-with-Google-Analytics-and-Snowflake

Explore the transformative power of data analytics in my portfolio, where Google Analytics and Snowflake converge to provide comprehensive insights. This project leverages advanced ETL techniques and real-time data integration to enhance user engagement and optimize content delivery effectively.

Language: Jupyter Notebook - Size: 1.48 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 4 - Forks: 1

dfornika/ncov-db

Store SARS-CoV-2 genomic analysis results from ncov2019-artic-nf and ncov-tools to a sqlite DB

Language: Python - Size: 60.5 KB - Last synced: about 1 month ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

nicholaishaw/Crowdfunding_ETL

Michigan State University Data Analytics Project 2

Language: Jupyter Notebook - Size: 422 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 1 - Forks: 0

Aishwarya-TheAnalyst/AtliQ-Grands-Hospitality-Insights-using-Power-BI

AtliQ Grands hotel Data Analysis using Power BI

Size: 4.25 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0

docwire/docwire

DocWire SDK: Award-winning modern data processing in C++20. SourceForge Community Choice & Microsoft support. AI-driven processing. Supports nearly 100 data formats, including email boxes and OCR. Boost efficiency in text extraction, web data extraction, data mining, document analysis. Offline processing is possible for security and confidentiality

Language: C++ - Size: 34.5 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 44 - Forks: 11

mathewsrc/ETL-Chicago-Cafe-Permits

This ETL (Extract, Transform, Load) project employs several Python libraries, including Airflow, Soda, Polars, YData Profiling, DuckDB, Requests, Loguru, and Google Cloud to streamline the extraction, transformation, and loading of CSV datasets from the U.S. government's data repository at https://catalog.data.gov.

Language: HTML - Size: 42.3 MB - Last synced: 21 days ago - Pushed: 6 months ago - Stars: 3 - Forks: 0

RYANFRANKLIN237/Data-cleansing

A group of python scripts that clean large data sets by removing duplicate data, putting data in correct formats, and removing redundant cells

Language: Python - Size: 7.81 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 1 - Forks: 0

udaisharma99/Human-Activity-Prediction

This project focuses on using sensor data to predict human activity and is based on the ExtraSensory dataset, created by Ph.D. students and staff at the Department of Electrical and Computer Engineering, University of California, San Diego.

Language: Jupyter Notebook - Size: 755 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0

ramkumarpj/project-three

SEC Finance Data Engineering - ETL process for SEC Finance data of S&P 500 companies. Jupyter Notebooks to run ETL work flows. The final dataset is hosted in MongoDB Atlas(cloud). The API is written using Python with PyMongo and Flask libraries. The dashboards with charts are hosted in MongoDB Atlas.

Language: Jupyter Notebook - Size: 3.01 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 3

marda-alliance/metadata_extractors

A Working Group on connecting and advancing interoperability of efforts on automated extraction of metadata from materials and chemical file formats

Size: 619 KB - Last synced: 2 months ago - Pushed: 5 months ago - Stars: 14 - Forks: 3

ayush9892/Supply-Chain-ETL

Data Engineering Project on Supply Chain ETL. Creating a dynamic ADF pipeline to ingest both Full Load and Incremental Load data from SQL Server and then transform these datasets based on medallion architecture using Databricks.

Language: Jupyter Notebook - Size: 1.57 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0

damaniayesh/Inventory_Management_Dashboard

This project provides Inventory Management using Power BI, extremely useful for Warehouse/ In-plant Inventory Managers to effectively control the Inventory levels and also maintain the Service Levels.

Size: 4.85 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0

ramkumarpj/Crowdfunding_ETL

This project takes the crowd funding data provided in excel files through Extract Transform and Load (ETL) process and makes it available in a relational database for further usage.

Language: Jupyter Notebook - Size: 768 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 1

codecadre/melhordazona-web

Web app using babashka/apache + ETL pipeline

Language: Clojure - Size: 14.6 MB - Last synced: about 1 month ago - Pushed: 4 months ago - Stars: 4 - Forks: 0

benispresence/hexbase

open-source ETL pipeline for HEX cryptocurrency data

Language: Python - Size: 525 KB - Last synced: 26 days ago - Pushed: over 1 year ago - Stars: 9 - Forks: 1

gopiashokan/Airbnb-Analysis-with-Tableau

Built an interactive Tableau dashboard to analyze the Airbnb data extracted from MongoDB Atlas. Developed a Streamlit application for trend analysis, pattern recognition, and data insights using EDA. Explored variations in price, location, property type, and seasons through dynamic plots and charts.

Language: Jupyter Notebook - Size: 1.41 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 1 - Forks: 1

marda-alliance/metadata_extractors_api

A place for the Metadata Extractors WG to work on ideas regarding API development, wrapping existing codes and associated tools.

Language: Python - Size: 2.78 MB - Last synced: about 14 hours ago - Pushed: 1 day ago - Stars: 2 - Forks: 2

lfhohmann/wordle-ETL

ETL for Wordle game

Language: Jupyter Notebook - Size: 157 KB - Last synced: 5 months ago - Pushed: about 2 years ago - Stars: 0 - Forks: 0

rtimbro185/syr_mads_ist722_data_warehouse

Syracuse University, Masters of Applied Data Science - IST 722 Data Warehouse

Language: TSQL - Size: 50.6 MB - Last synced: 5 months ago - Pushed: about 4 years ago - Stars: 3 - Forks: 4

vaxdata22/Water-Quality-DW-on-SQL-Server

This is an MSSQL Data Warehouse and ETL implementation on specially formatted Water Quality dataset from DEFRA, UK

Language: Jupyter Notebook - Size: 386 KB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 1 - Forks: 0

praveendecode/ETL-Projects

Implemented ETL projects with interactive Streamlit UI for user-friendly data extraction, transformation, and loading tasks

Size: 1000 Bytes - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 0 - Forks: 0

KDerec/bookscrap

Student project #1 - Web scraping, use Python basics to create a program that automate the process of extracting, transform and load data from the online library "Books to Scrape".

Language: Python - Size: 11.2 MB - Last synced: 7 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

Pawsanie/Steam_statistics_ETL

This pipeline can be used to collect statistical information about all games, distributed through the Steam platform.

Language: Python - Size: 2.35 MB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 1 - Forks: 0

phelps-sg/zipline-tardis-bundle

A bundle for zipline-reloaded to allow data for crypto assets to be ingested from Tardis

Language: Python - Size: 93.3 MB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 1 - Forks: 0

klipperdev/klipper

A Web and API Development Platform build over Symfony

Language: PHP - Size: 3.1 MB - Last synced: about 8 hours ago - Pushed: about 1 year ago - Stars: 1 - Forks: 0

PredictGroup/1C-ERP-OLAP

OLAP ITL-Утилиты для 1С:ERP Управление предприятием.

Language: C# - Size: 1.13 MB - Last synced: 7 months ago - Pushed: about 5 years ago - Stars: 6 - Forks: 5

MadAboutImport/DIFS

Data Importer For SharePoint & Office 365

Size: 88.2 MB - Last synced: 7 months ago - Pushed: almost 2 years ago - Stars: 23 - Forks: 15

praveen-kumar-maurya/Superstore-Sales-Dashboard

The superstore sales dashboard developed in Power BI aims to increase sales and profitability by providing data-driven insights. It offers a comprehensive view of sales and profit trends to identify growth opportunities and inform marketing strategies. The goal is to achieve sustainable growth and profitability by utilizing the insights provided.

Size: 3.78 MB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 0 - Forks: 0

fab2s/YaEtl

Yet Another ETL in PHP

Language: PHP - Size: 320 KB - Last synced: 28 days ago - Pushed: over 1 year ago - Stars: 64 - Forks: 16

nachiketdixit/Google_BI_Professional

This certification focuses on in-demand skills like data modeling, data visualization, and dashboarding and reporting.

Size: 500 KB - Last synced: 4 months ago - Pushed: 8 months ago - Stars: 0 - Forks: 0

ats-tandjoeng7/Mission-to-Mars

Application of Python web scraping methodologies for performing data analytics and visualization as part of the Extract, Transform, and Load (ETL) process.

Language: Jupyter Notebook - Size: 719 KB - Last synced: 8 months ago - Pushed: over 1 year ago - Stars: 2 - Forks: 0

ats-tandjoeng7/Crowdfunding-ETL

Application of Python libraries, like Pandas, and their useful functions for performing efficient Extract, Transform, and Load (ETL) process.

Language: Jupyter Notebook - Size: 1.1 MB - Last synced: 8 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

gopiashokan/Phonepe-Pulse-Data-Visualization-and-Exploration

Visualize insights from PhonePe Pulse data using Python, Streamlit, and Plotly. Explore interactive charts and uncover trends in digital transactions.

Language: Python - Size: 105 KB - Last synced: 5 months ago - Pushed: 8 months ago - Stars: 0 - Forks: 0

seonguook88/Seong_Portfolio

Data Analytics Portfolio

Size: 1000 Bytes - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 1 - Forks: 0

NEXTSLIM/The-Music-has-Changed-WEBSIDE

We going to examine two data sets relate with the music Industry. We want Extract, transform and load this in order to identify insides and trend about the music Industry.

Language: CSS - Size: 822 KB - Last synced: 10 months ago - Pushed: almost 3 years ago - Stars: 1 - Forks: 0

NEXTSLIM/The-Music-has-Changed-Extract-transform-load-

We examine two data sets relate with the music Industry. We Extract, transform and load the data sets in order to create a data base and identify insides and trends about the music Industry.

Language: Jupyter Notebook - Size: 47 MB - Last synced: 10 months ago - Pushed: about 3 years ago - Stars: 1 - Forks: 0

StationA/xgeo 📦

Scriptable geospatial data processing engine

Language: Go - Size: 411 KB - Last synced: 10 months ago - Pushed: about 5 years ago - Stars: 4 - Forks: 0

IamJafar/Youtube_Data_Harvesting_and_Warehousing

Domain : Social Media | Extracting data using Youtube API and storing it on MongoDB then Transforming it to a relational databaselike MySQL. For getting various info about youtube channels.

Language: Python - Size: 20.5 KB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0

IsaacMwendwa/Twitter-ETL-of-Elections-PoliceBrutality-HateSpeech-Data

This Twitter ETL project is aimed at providing data to support UN SDG number 16. The project is directed at providing data to generate actionable insights to stakeholders; regarding the 2022 Presidential Elections, Police Brutality, and Propagation of Hate Speech on Twitter

Language: Python - Size: 593 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 1 - Forks: 0

deepankarvarma/Extract-Transform-Load-Process-Techniques

This repository contains code for comparing the performance of three different ELT (Extract, Load, Transform) methods on CSV files of different sizes. The three methods are implemented in Python using different approaches and libraries, and their execution times are compared and plotted for analysis.

Language: Python - Size: 31.8 MB - Last synced: 27 days ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

tejal04/NYC-TLC-Data-Engineering

NYC TLC Data Analysis using Python, GCP Storage, Compute Engine, Mage Data Pipeline Tool, BigQuery, and Looker Studio. Aims to extract insights from the dataset for informed decisions and deeper operational understanding.

Language: Python - Size: 1.07 MB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

python-bonobo/bonobo-sqlalchemy

PREVIEW - SQL databases in Bonobo, using sqlalchemy

Language: Python - Size: 97.7 KB - Last synced: 24 days ago - Pushed: over 1 year ago - Stars: 25 - Forks: 14

GreenInfo-Network/nyc-crash-mapper-etl-script

Extract, Transform, and Load script for fetching new data from the NYC Open Data Portal's vehicle collision data and loading into the NYC Crash Mapper table on CARTO.

Language: Python - Size: 4.29 MB - Last synced: about 1 month ago - Pushed: 7 months ago - Stars: 3 - Forks: 0

ats-tandjoeng7/surfs_up

Application of Python database toolkits, such as SQLAlchemy and Flask, for performing data analytics and visualization as part of the Extract, Transform, and Load (ETL) process.

Language: Jupyter Notebook - Size: 617 KB - Last synced: 8 months ago - Pushed: over 1 year ago - Stars: 1 - Forks: 0

dimgold/ETL_with_Python

ETL with Python - Taught at DWH course 2017 (TAU)

Language: Jupyter Notebook - Size: 115 KB - Last synced: about 1 year ago - Pushed: over 6 years ago - Stars: 93 - Forks: 51

StationA/landgrab 📦

Geospatial data hoarding system

Language: Python - Size: 72.3 KB - Last synced: 3 months ago - Pushed: about 5 years ago - Stars: 1 - Forks: 0

tek-cub/nlp_job-postings

Natural language processing of job postings in order to gain insight into the data science job market.

Language: Jupyter Notebook - Size: 3.06 MB - Last synced: over 1 year ago - Pushed: over 1 year ago - Stars: 2 - Forks: 0

kpratikin/Business-Intelligence-and-Data-Warehousing

Business Intelligence and Data Warehousing Project

Language: TSQL - Size: 5.41 MB - Last synced: over 1 year ago - Pushed: over 4 years ago - Stars: 6 - Forks: 7

sam-marhaendra/etl-project-anteraja-reviews

This repository is created for final group project on Data Engineering course.

Language: Jupyter Notebook - Size: 1.87 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

python-bonobo/bonobo-docker

PREVIEW - Run Bonobo data processing graphs in docker containers.

Language: Python - Size: 73.2 KB - Last synced: 24 days ago - Pushed: over 1 year ago - Stars: 13 - Forks: 6

JaviSandoval94/ETL-Project Fork of ArceSaenzLuisAlejandro/ETL-Project

This project aims to create an ETL pipeline from energy consumption data.

Language: Jupyter Notebook - Size: 8.95 MB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 1 - Forks: 0

andreasscherbaum/faa

FAA Airline On-Time Performance Data

Language: Shell - Size: 125 KB - Last synced: about 1 year ago - Pushed: about 11 years ago - Stars: 1 - Forks: 1

taiwofawumi/DE_ETL_HTML_CSV_JSON

This notebook scrapes information about the largest banks by market capitalization from a wiki page, and stores the information both as a CSV and as a JSON file.

Language: Jupyter Notebook - Size: 8.79 KB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

OrionExplorer/dcpam

Data Construct-Populate-Access-Manage - Open source data warehouse solution.

Language: C - Size: 57.3 MB - Last synced: about 1 year ago - Pushed: about 3 years ago - Stars: 0 - Forks: 0

python-bonobo/bonobo-selenium

PRE-ALPHA - Write web crawlers using Bonobo

Language: Python - Size: 19.5 KB - Last synced: about 1 month ago - Pushed: over 1 year ago - Stars: 4 - Forks: 2

ZGrinacoff/ETL-Project

E (Extract), T (Transform), L (Load) Project that showcases both SQL and No-SQL Databases.

Language: Jupyter Notebook - Size: 4.6 MB - Last synced: about 1 year ago - Pushed: almost 5 years ago - Stars: 0 - Forks: 1

PetraLee2019/Crime-Anyltics

Approximately 10 people are shot on an average day in Chicago. This project focuses on Poverty and Crime in Chicago Neighborhoods. Full-Stack Project.

Language: Jupyter Notebook - Size: 3.33 MB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

mediaintegration/twiddlepy

Python module for extracting, transforming and loading data

Language: Python - Size: 82 KB - Last synced: 16 days ago - Pushed: over 1 year ago - Stars: 2 - Forks: 1

ZGrinacoff/Citi-Bike-Analytics

An analysis of Citi Bike with Tableau from January 2018 - September 2019

Size: 12.7 KB - Last synced: about 1 year ago - Pushed: over 4 years ago - Stars: 0 - Forks: 0

samsk/etlrun

Extract-Transform-Load tool based on Message passing, self reprocessing XML pipeline

Language: Perl - Size: 869 KB - Last synced: about 1 year ago - Pushed: about 5 years ago - Stars: 0 - Forks: 0