GitHub topics: data-preprocessing-pipelines
data-prep-kit/data-prep-kit
Open source project for data preparation of LLM application builders
Language: HTML - Size: 220 MB - Last synced at: 1 day ago - Pushed at: 5 days ago - Stars: 635 - Forks: 194

DigitalLifeYZQiu/Data-Process-Library
The data process library to help better industrial data understanding.
Language: Jupyter Notebook - Size: 5.52 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

firefly-cpp/succulent
Collect POST requests
Language: Python - Size: 357 KB - Last synced at: 17 days ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 4

preprocessy/preprocessy
Python package for Customizable Data Preprocessing Pipelines
Language: Jupyter Notebook - Size: 992 KB - Last synced at: 28 days ago - Pushed at: about 2 months ago - Stars: 42 - Forks: 14

shamspias/gpt3-data-preprocessing
This repository containing code for preprocessing text data from PDF and DOCX files for use with GPT-3. It includes steps such as tokenization, removal of stop words and punctuation, and formatting for GPT-3 input.
Language: Python - Size: 11.7 KB - Last synced at: 25 days ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 1

kolhesamiksha/Nemo_Curator
This repository contains a sample text data-preparation code using Nemo Curator for pre-training or synthetic data generation
Language: Jupyter Notebook - Size: 138 KB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

PrasunDatta/adorsho-praniSheba_Preprocessing-Pipeline-of-Muzzle-Data-of-Cow
This work highlights my contribution as a "ML Engineer" at "adorsho praniSheb"(an ML based agro farming company of Bangladesh) where I was assigned the task of designing the preprocessing pipeline.
Language: Jupyter Notebook - Size: 11.5 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

vuanhngo14/Decision-Tree-from-Scratch
Understand and Implement decision tree
Language: Jupyter Notebook - Size: 148 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

SaraLittleSquirrel/Obesity-estimator
Project for Machine Learning Data Mining course
Language: Jupyter Notebook - Size: 5.63 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

WillCaton2350/Parallel-ETL-Pipeline
Automated Web Crawler and Data Preprocessing tool written in Python and Scrapy. The Parallel ETL Process involves multiple steps, extracting specific data from a web page using scrapy and organizing it into structured items. Additionally, the extracted data is saved to a separate JSON file for further analysis and integration into an MySQL database
Language: Python - Size: 31.3 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

MustofAhmed41/Data-Preprocessing-using-Distributed-Database
Machine learning models cannot be directly applied to raw data. This desktop application consists of a central server and two client servers. The main servers send raw data to clients, where the data is preprocessed and prepared to be fed to the machine learning model.
Size: 74.2 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

etetteh/production_ml
Language: Python - Size: 35.2 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0
