An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: data-processing-pipelines

NVIDIA/NeMo-Curator

Scalable data pre processing and curation toolkit for LLMs

Language: Jupyter Notebook - Size: 7.66 MB - Last synced at: about 13 hours ago - Pushed at: about 14 hours ago - Stars: 879 - Forks: 124

graphbookai/graphbook

Visual AI development framework for training and inference of ML models, scaling pipelines, and automating workflows with Python.⭐ Leave a star to support us!

Language: Python - Size: 1.68 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 35 - Forks: 3

westandskif/convtools

convtools is a specialized Python library for dynamic, declarative data transformations with automatic code generation

Language: Python - Size: 1.87 MB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 40 - Forks: 9

kaburia/filter-stations

Making it easier to navigate and clean TAHMO weather station data for ML development

Language: Python - Size: 29.2 MB - Last synced at: 29 days ago - Pushed at: 8 months ago - Stars: 17 - Forks: 4

mehanix/dhrw

🎢 IaaS visual editor to create & deploy data processing pipelines - python, rmq, react, meteorjs

Language: JavaScript - Size: 1.88 MB - Last synced at: 18 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

tamasgal/thepipe

A simplistic, general purpose pipeline framework.

Language: Python - Size: 104 KB - Last synced at: about 19 hours ago - Pushed at: almost 3 years ago - Stars: 14 - Forks: 2

softwaresalt/blog

Data Engineering & Software Blog

Size: 4.96 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

shuq007/datascience-notebooks

Notebooks from finance, general practice and Jovian courses on data analysis, ML and DL

Language: Jupyter Notebook - Size: 27.3 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Plato-solutions/artifician

Artifician is an event-driven framework designed to simplify and accelerate the process of preparing datasets for Artificial Intelligence models.

Language: Python - Size: 4.04 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 10 - Forks: 0

AIoT-Group-UoP/crossai

An open-source Python library for processing and developing End-to-End AI pipelines for Time Series Analysis

Language: Jupyter Notebook - Size: 11 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

smmiri/etl-visuals

Codes for data flow between models, data post-process, and visualization

Language: Jupyter Notebook - Size: 3.81 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

lhotanok/data-engineering

Homework assignments for MFF UK course NDBI046 - Introduction to Data Engineering

Language: TypeScript - Size: 5.25 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

adarshnitt/30-Day-of-ML

Dataset

Language: Jupyter Notebook - Size: 1.02 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

99sbr/Predictive-Customer-Analytics

Understanding the customer life cycle Acquiring customer data Applying big data concepts to your customer relationships Finding high propensity prospects Upselling by identifying related products and interests Generating customer loyalty by discovering response patterns Predicting customer lifetime value (CLV) Identifying dissatisfied customers Uncovering attrition patterns Applying predictive analytics in multiple use cases Designing data processing pipelines Implementing continuous improvement

Language: Jupyter Notebook - Size: 128 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 1

subhasisgorai/MyExperiments

Experimental libraries - Azure Storage, multithreaded Data Processing pipelines, and many more ...

Language: Java - Size: 29.3 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

Related Keywords
data-processing-pipelines 15 data-processing 7 python 5 machine-learning 4 data-science 4 data-analysis 2 artificial-intelligence 2 data-engineering 2 data-pipeline 2 data 2 provenance 2 azure-storage 2 azure-synapse-serverless-sql 1 digital-signal-processing 1 deep-learning 1 dataset-preparation 1 trading-strategies 1 finance 1 conversions 1 azure-fabric 1 azure-functions 1 csharp 1 cython 1 sql 1 serverless-functions 1 azure-synapse-pipelines 1 serverless-architectures 1 event-driven-microservices 1 event-driven-architecture 1 dotnet-core 1 data-processing-software-machines 1 reactive 1 multithreading 1 monitoring 1 lightweight 1 fault-tolerant 1 customer-lifetime-value 1 customer-life-cycle 1 practice-machine-learning 1 pipelining 1 correlation-coefficient 1 30-day-of-ml 1 skos-rdf 1 rdf 1 dcat-ap 1 data-cube 1 apache-airflow 1 airflow-dags 1 visualization 1 etl-framework 1 etl-automation 1 etl 1 data-structures 1 data-flow 1 open-source 1 machine-learning-pipelines 1 code-generation 1 workflow 1 research 1 pytorch 1 ml 1 framework 1 ai 1 semantic-deduplication 1 llmapps 1 llm-data-quality 1 llm 1 large-scale-data-processing 1 large-language-models 1 fine-tuning 1 fast-data-processing 1 deduplication 1 datarecipes 1 datacuration 1 data-quality 1 data-preparation 1 data-prep 1 data-curation 1 azure-devops 1 azure-data-lake 1 azure-data-factory 1 pipelines 1 hacktoberfest 1 react-flow 1 rabbitmq 1 meteorjs-application 1 help-wanted 1 good-first-issue 1 docker-compose 1 data-visualization 1 data-processing-system 1 data-processing-and-analysis 1 data-pipelines 1 computational-graphs 1 computational-graph 1 pypi-package 1 api-development 1 transformations 1 parsing 1 csv-converter 1