GitHub topics: dataingestion
tashi-2004/Apache-Airflow-Kafka-Spark-DeltaLake-Real-Time-Stream-Pipeline
This project implements a real-time data pipeline using Apache Airflow, Kafka, Apache Spark, and Delta Lake. It supports both batch (Coldpath) and real-time (Hotpath) data ingestion, processing, and storage. Airflow is used for orchestrating the data workflows.
Language: Python - Size: 12.5 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

NaS-Research/knowledge-model
Our knowledge system systematically ingests, processes, and indexes open-access life science publications. It supports internal research by providing precise question-answering and efficient retrieval from a continuously updated repository of scientific literature
Language: Python - Size: 95.4 MB - Last synced at: 17 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

Anu0408/Truck_Delays_Classification-End-to-End-Machine-Learning-Application
Modern machine learning project development - This end-to-end project implementation provides the real-time delay updates to logistic companies. It uses MLflow for model tracking and management, Hopsworks Feature Store for storing and managing the dataset, and Streamlit for building an interactive web application to predict truck delays.
Language: Jupyter Notebook - Size: 19 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

chayansraj/Microsoft-Azure-Medallion-Data-pipeline
In this project we are going to create an end-to-end data platform right from Data Ingestion, Data Transformation, Data Loading and Reporting.
Language: Jupyter Notebook - Size: 11.5 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 7 - Forks: 6

Jimmymugendi/LuxDev-week-1-boot-camp
Language: Jupyter Notebook - Size: 273 KB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

catherman/FIRE-2023
An urban fire risk prediction model using machine learning. Visualization of findings with map and table in interactive Streamlit app.
Language: Jupyter Notebook - Size: 10.2 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

camara94/introduction-to-data-engineering
Describe the different entities that form a modern data ecosystem. Describe and differentiate between the role and responsibilities of Data Engineers, Data Scientists, Data Analysts, Business Analysts, and Business Intelligence Analysts. Explain what Data Engineering is. List the tasks that need to be performed in a typical data engineering lifecycle. Describe what a day in the life of a Data Engineer looks like.
Size: 1.77 MB - Last synced at: 25 days ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

Adesoji1/GooglesheettoMysql
Export sales data from Google Sheet to a relational DBSM
Language: Python - Size: 488 KB - Last synced at: 30 days ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

MaheshKumarMK/Compressive-strength-prediction
The main purpose of this repository is to build the pipeline for training of regression models and predict the compressive strength of concrete to reduce the risk and cost involved in discarding the concrete structures when the concrete cube test fails.
Language: Python - Size: 187 KB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

george-mountain/Data-Extraction-Integration-and-Analysis---Clustering-Operations
This repository for a project detailing the step by step approach of scraping data, integrating data from various sources, performing analysis on data from various sources for the purpose of analaysis. It also shows how APIs can be harnessed for data engr operations. In this project, the four square API was utilized for the location data.
Language: Jupyter Notebook - Size: 373 KB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

rizkyirw/Pipeline-Project
Resource for ETL & Data Ingestion program using Apache Airflow
Language: Python - Size: 207 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

OmerGamie/mlproject
This repo hosts an end-to-end machine learning project designed to cover the full lifecycle of a data science initiative. The project encompasses a comprehensive approach including data Ingestion, preprocessing, exploratory data analysis (EDA), feature engineering, model training and evaluation, hyperparameter tuning, and cloud deployment.
Language: Jupyter Notebook - Size: 1.93 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

immanuvelprathap/Data-Science-Road-Map
Language: Python - Size: 55.7 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 1

luccayz/dataengineer_project_002
O projeto consiste em desenvolver uma solução para a migração de dados de uma fonte com muitos arquivos para uma base de dados hospedada em ambiente Cloud.
Language: TSQL - Size: 232 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

psnegi/data_science_tools1
course website for data science tools 1
Language: Jupyter Notebook - Size: 10.9 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 6

khadkarajesh/wine-prediction
White and Red Wine classification using logistic regression
Language: HTML - Size: 2.74 MB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

bolzon/poc-localstack
Proof of concept using localstack as a mock AWS (cloud) to build a basic data ingestion infra using Terraform
Language: HCL - Size: 25.4 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 1
