Topic: "data-preprocessing-pipelines"
data-prep-kit/data-prep-kit
Open source project for data preparation for GenAI applications
Language: HTML - Size: 223 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 723 - Forks: 201

preprocessy/preprocessy
Python package for Customizable Data Preprocessing Pipelines
Language: Jupyter Notebook - Size: 993 KB - Last synced at: 9 days ago - Pushed at: about 1 month ago - Stars: 43 - Forks: 14

shamspias/gpt3-data-preprocessing
This repository containing code for preprocessing text data from PDF and DOCX files for use with GPT-3. It includes steps such as tokenization, removal of stop words and punctuation, and formatting for GPT-3 input.
Language: Python - Size: 11.7 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 1

firefly-cpp/succulent
Collect POST requests
Language: Python - Size: 462 KB - Last synced at: 3 days ago - Pushed at: 17 days ago - Stars: 3 - Forks: 3

kolhesamiksha/Nemo_Curator
This repository contains a sample text data-preparation code using Nemo Curator for pre-training or synthetic data generation
Language: Jupyter Notebook - Size: 138 KB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

vuanhngo14/Decision-Tree-from-Scratch
Understand and Implement decision tree
Language: Jupyter Notebook - Size: 148 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

DigitalLifeYZQiu/Data-Process-Library
The data process library to help better industrial data understanding.
Language: Jupyter Notebook - Size: 46.5 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

SaraLittleSquirrel/Obesity-estimator
Project for Machine Learning Data Mining course
Language: Jupyter Notebook - Size: 5.63 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

WillCaton2350/Parallel-ETL-Pipeline
Automated Web Crawler and Data Preprocessing tool written in Python and Scrapy. The Parallel ETL Process involves multiple steps, extracting specific data from a web page using scrapy and organizing it into structured items. Additionally, the extracted data is saved to a separate JSON file for further analysis and integration into an MySQL database
Language: Python - Size: 31.3 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

PrasunDatta/adorsho-praniSheba_Preprocessing-Pipeline-of-Muzzle-Data-of-Cow
This work highlights my contribution as a "ML Engineer" at "adorsho praniSheb"(an ML based agro farming company of Bangladesh) where I was assigned the task of designing the preprocessing pipeline.
Language: Jupyter Notebook - Size: 11.5 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

MustofAhmed41/Data-Preprocessing-using-Distributed-Database
Machine learning models cannot be directly applied to raw data. This desktop application consists of a central server and two client servers. The main servers send raw data to clients, where the data is preprocessed and prepared to be fed to the machine learning model.
Size: 74.2 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

etetteh/production_ml
Language: Python - Size: 35.2 KB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0
