An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: dataingestion

tashi-2004/Apache-Airflow-Kafka-Spark-DeltaLake-Real-Time-Stream-Pipeline

This project implements a real-time data pipeline using Apache Airflow, Kafka, Apache Spark, and Delta Lake. It supports both batch (Coldpath) and real-time (Hotpath) data ingestion, processing, and storage. Airflow is used for orchestrating the data workflows.

Language: Python - Size: 12.5 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

NaS-Research/knowledge-model

Our knowledge system systematically ingests, processes, and indexes open-access life science publications. It supports internal research by providing precise question-answering and efficient retrieval from a continuously updated repository of scientific literature

Language: Python - Size: 95.4 MB - Last synced at: 17 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

Anu0408/Truck_Delays_Classification-End-to-End-Machine-Learning-Application

Modern machine learning project development - This end-to-end project implementation provides the real-time delay updates to logistic companies. It uses MLflow for model tracking and management, Hopsworks Feature Store for storing and managing the dataset, and Streamlit for building an interactive web application to predict truck delays.

Language: Jupyter Notebook - Size: 19 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

chayansraj/Microsoft-Azure-Medallion-Data-pipeline

In this project we are going to create an end-to-end data platform right from Data Ingestion, Data Transformation, Data Loading and Reporting.

Language: Jupyter Notebook - Size: 11.5 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 7 - Forks: 6

Jimmymugendi/LuxDev-week-1-boot-camp

Language: Jupyter Notebook - Size: 273 KB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

catherman/FIRE-2023

An urban fire risk prediction model using machine learning. Visualization of findings with map and table in interactive Streamlit app.

Language: Jupyter Notebook - Size: 10.2 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

camara94/introduction-to-data-engineering

Describe the different entities that form a modern data ecosystem. Describe and differentiate between the role and responsibilities of Data Engineers, Data Scientists, Data Analysts, Business Analysts, and Business Intelligence Analysts. Explain what Data Engineering is. List the tasks that need to be performed in a typical data engineering lifecycle. Describe what a day in the life of a Data Engineer looks like.

Size: 1.77 MB - Last synced at: 25 days ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

Adesoji1/GooglesheettoMysql

Export sales data from Google Sheet to a relational DBSM

Language: Python - Size: 488 KB - Last synced at: 30 days ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

MaheshKumarMK/Compressive-strength-prediction

The main purpose of this repository is to build the pipeline for training of regression models and predict the compressive strength of concrete to reduce the risk and cost involved in discarding the concrete structures when the concrete cube test fails.

Language: Python - Size: 187 KB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

george-mountain/Data-Extraction-Integration-and-Analysis---Clustering-Operations

This repository for a project detailing the step by step approach of scraping data, integrating data from various sources, performing analysis on data from various sources for the purpose of analaysis. It also shows how APIs can be harnessed for data engr operations. In this project, the four square API was utilized for the location data.

Language: Jupyter Notebook - Size: 373 KB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

rizkyirw/Pipeline-Project

Resource for ETL & Data Ingestion program using Apache Airflow

Language: Python - Size: 207 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

OmerGamie/mlproject

This repo hosts an end-to-end machine learning project designed to cover the full lifecycle of a data science initiative. The project encompasses a comprehensive approach including data Ingestion, preprocessing, exploratory data analysis (EDA), feature engineering, model training and evaluation, hyperparameter tuning, and cloud deployment.

Language: Jupyter Notebook - Size: 1.93 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

immanuvelprathap/Data-Science-Road-Map

Language: Python - Size: 55.7 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 1

luccayz/dataengineer_project_002

O projeto consiste em desenvolver uma solução para a migração de dados de uma fonte com muitos arquivos para uma base de dados hospedada em ambiente Cloud.

Language: TSQL - Size: 232 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

psnegi/data_science_tools1

course website for data science tools 1

Language: Jupyter Notebook - Size: 10.9 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 6

khadkarajesh/wine-prediction

White and Red Wine classification using logistic regression

Language: HTML - Size: 2.74 MB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

bolzon/poc-localstack

Proof of concept using localstack as a mock AWS (cloud) to build a basic data ingestion infra using Terraform

Language: HCL - Size: 25.4 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 1