GitHub topics: data-preprocessing-pipelines

Repositories

data-prep-kit/data-prep-kit

Open source project for data preparation for GenAI applications

Language: HTML - Size: 223 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 731 - Forks: 204

firefly-cpp/succulent

Collect POST requests

Language: Python - Size: 462 KB - Last synced at: 16 days ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 3

preprocessy/preprocessy

Python package for Customizable Data Preprocessing Pipelines

Language: Jupyter Notebook - Size: 993 KB - Last synced at: 22 days ago - Pushed at: about 2 months ago - Stars: 43 - Forks: 14

DigitalLifeYZQiu/Data-Process-Library

The data process library to help better industrial data understanding.

Language: Jupyter Notebook - Size: 46.5 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

shamspias/gpt3-data-preprocessing

This repository containing code for preprocessing text data from PDF and DOCX files for use with GPT-3. It includes steps such as tokenization, removal of stop words and punctuation, and formatting for GPT-3 input.

Language: Python - Size: 11.7 KB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 1

kolhesamiksha/Nemo_Curator

This repository contains a sample text data-preparation code using Nemo Curator for pre-training or synthetic data generation

Language: Jupyter Notebook - Size: 138 KB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

PrasunDatta/adorsho-praniSheba_Preprocessing-Pipeline-of-Muzzle-Data-of-Cow

This work highlights my contribution as a "ML Engineer" at "adorsho praniSheb"(an ML based agro farming company of Bangladesh) where I was assigned the task of designing the preprocessing pipeline.

Language: Jupyter Notebook - Size: 11.5 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

vuanhngo14/Decision-Tree-from-Scratch

Understand and Implement decision tree

Language: Jupyter Notebook - Size: 148 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

SaraLittleSquirrel/Obesity-estimator

Project for Machine Learning Data Mining course

Language: Jupyter Notebook - Size: 5.63 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

WillCaton2350/Parallel-ETL-Pipeline

Automated Web Crawler and Data Preprocessing tool written in Python and Scrapy. The Parallel ETL Process involves multiple steps, extracting specific data from a web page using scrapy and organizing it into structured items. Additionally, the extracted data is saved to a separate JSON file for further analysis and integration into an MySQL database

Language: Python - Size: 31.3 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

MustofAhmed41/Data-Preprocessing-using-Distributed-Database

Machine learning models cannot be directly applied to raw data. This desktop application consists of a central server and two client servers. The main servers send raw data to clients, where the data is preprocessed and prepared to be fed to the machine learning model.

Size: 74.2 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

etetteh/production_ml

Language: Python - Size: 35.2 KB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

Related Keywords

data-preprocessing-pipelines 12 machine-learning 6 data-science 3 data-preprocessing 3 random-forest 2 voting-classifier 2 decision-tree 2 adaboost 2 rbf-kernel 1 polynomial-kernel 1 pandas 1 numpy 1 linear-kernel 1 k-neighbors 1 extra-trees 1 data-mining 1 decision-tree-from-scratch 1 data-visualization 1 python-script 1 jupyter-notebook 1 image-preprocessing 1 synthetic-dataset-generation 1 scikit-learn 1 preprocessing-data 1 optuna 1 gradient-boosting 1 docker 1 diabetes-prediction 1 plsql 1 distributed-database 1 database 1 webcrawler 1 scrapy 1 python3 1 mysql 1 etl-pipeline 1 data-analysis 1 support-vector-machines 1 sklearn 1 data-collection 1 spark 1 ray 1 python 1 malware 1 llmapps 1 llm 1 large-scale-data-processing 1 large-language-models 1 finetuning 1 deduplication 1 datarecipes 1 datacuration 1 data-preparation 1 data-prep 1 data 1 code-quality 1 nvidia 1 nemo 1 generative-ai 1 finetuning-llms 1 curator 1 gpt-3 1 artificial-intelligence 1 data-understanding 1 under-construction 1 python-library 1 preprocessing 1 pipelines 1 hacktoberfest2022 1 hacktoberfest 1 data-engineering 1 raspberry-pi 1 esp32 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos