An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: data-preprocessing-pipelines

data-prep-kit/data-prep-kit

Open source project for data preparation of LLM application builders

Language: HTML - Size: 220 MB - Last synced at: 1 day ago - Pushed at: 5 days ago - Stars: 635 - Forks: 194

DigitalLifeYZQiu/Data-Process-Library

The data process library to help better industrial data understanding.

Language: Jupyter Notebook - Size: 5.52 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

firefly-cpp/succulent

Collect POST requests

Language: Python - Size: 357 KB - Last synced at: 17 days ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 4

preprocessy/preprocessy

Python package for Customizable Data Preprocessing Pipelines

Language: Jupyter Notebook - Size: 992 KB - Last synced at: 28 days ago - Pushed at: about 2 months ago - Stars: 42 - Forks: 14

shamspias/gpt3-data-preprocessing

This repository containing code for preprocessing text data from PDF and DOCX files for use with GPT-3. It includes steps such as tokenization, removal of stop words and punctuation, and formatting for GPT-3 input.

Language: Python - Size: 11.7 KB - Last synced at: 25 days ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 1

kolhesamiksha/Nemo_Curator

This repository contains a sample text data-preparation code using Nemo Curator for pre-training or synthetic data generation

Language: Jupyter Notebook - Size: 138 KB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

PrasunDatta/adorsho-praniSheba_Preprocessing-Pipeline-of-Muzzle-Data-of-Cow

This work highlights my contribution as a "ML Engineer" at "adorsho praniSheb"(an ML based agro farming company of Bangladesh) where I was assigned the task of designing the preprocessing pipeline.

Language: Jupyter Notebook - Size: 11.5 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

vuanhngo14/Decision-Tree-from-Scratch

Understand and Implement decision tree

Language: Jupyter Notebook - Size: 148 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

SaraLittleSquirrel/Obesity-estimator

Project for Machine Learning Data Mining course

Language: Jupyter Notebook - Size: 5.63 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

WillCaton2350/Parallel-ETL-Pipeline

Automated Web Crawler and Data Preprocessing tool written in Python and Scrapy. The Parallel ETL Process involves multiple steps, extracting specific data from a web page using scrapy and organizing it into structured items. Additionally, the extracted data is saved to a separate JSON file for further analysis and integration into an MySQL database

Language: Python - Size: 31.3 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

MustofAhmed41/Data-Preprocessing-using-Distributed-Database

Machine learning models cannot be directly applied to raw data. This desktop application consists of a central server and two client servers. The main servers send raw data to clients, where the data is preprocessed and prepared to be fed to the machine learning model.

Size: 74.2 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

etetteh/production_ml

Language: Python - Size: 35.2 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0