GitHub topics: data-processing-pipelines
NVIDIA/NeMo-Curator
Scalable data pre processing and curation toolkit for LLMs
Language: Jupyter Notebook - Size: 7.66 MB - Last synced at: about 13 hours ago - Pushed at: about 14 hours ago - Stars: 879 - Forks: 124

graphbookai/graphbook
Visual AI development framework for training and inference of ML models, scaling pipelines, and automating workflows with Python.⭐ Leave a star to support us!
Language: Python - Size: 1.68 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 35 - Forks: 3

westandskif/convtools
convtools is a specialized Python library for dynamic, declarative data transformations with automatic code generation
Language: Python - Size: 1.87 MB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 40 - Forks: 9

kaburia/filter-stations
Making it easier to navigate and clean TAHMO weather station data for ML development
Language: Python - Size: 29.2 MB - Last synced at: 29 days ago - Pushed at: 8 months ago - Stars: 17 - Forks: 4

mehanix/dhrw
🎢 IaaS visual editor to create & deploy data processing pipelines - python, rmq, react, meteorjs
Language: JavaScript - Size: 1.88 MB - Last synced at: 18 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

tamasgal/thepipe
A simplistic, general purpose pipeline framework.
Language: Python - Size: 104 KB - Last synced at: about 19 hours ago - Pushed at: almost 3 years ago - Stars: 14 - Forks: 2

softwaresalt/blog
Data Engineering & Software Blog
Size: 4.96 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

shuq007/datascience-notebooks
Notebooks from finance, general practice and Jovian courses on data analysis, ML and DL
Language: Jupyter Notebook - Size: 27.3 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Plato-solutions/artifician
Artifician is an event-driven framework designed to simplify and accelerate the process of preparing datasets for Artificial Intelligence models.
Language: Python - Size: 4.04 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 10 - Forks: 0

AIoT-Group-UoP/crossai
An open-source Python library for processing and developing End-to-End AI pipelines for Time Series Analysis
Language: Jupyter Notebook - Size: 11 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

smmiri/etl-visuals
Codes for data flow between models, data post-process, and visualization
Language: Jupyter Notebook - Size: 3.81 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

lhotanok/data-engineering
Homework assignments for MFF UK course NDBI046 - Introduction to Data Engineering
Language: TypeScript - Size: 5.25 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

adarshnitt/30-Day-of-ML
Dataset
Language: Jupyter Notebook - Size: 1.02 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

99sbr/Predictive-Customer-Analytics
Understanding the customer life cycle Acquiring customer data Applying big data concepts to your customer relationships Finding high propensity prospects Upselling by identifying related products and interests Generating customer loyalty by discovering response patterns Predicting customer lifetime value (CLV) Identifying dissatisfied customers Uncovering attrition patterns Applying predictive analytics in multiple use cases Designing data processing pipelines Implementing continuous improvement
Language: Jupyter Notebook - Size: 128 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 1

subhasisgorai/MyExperiments
Experimental libraries - Azure Storage, multithreaded Data Processing pipelines, and many more ...
Language: Java - Size: 29.3 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0
