data-preprocessing-pipelines | Topic

Topic: "data-preprocessing-pipelines"

data-prep-kit/data-prep-kit

Open source project for data preparation for GenAI applications

Language: HTML - Size: 223 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 723 - Forks: 201

preprocessy/preprocessy

Python package for Customizable Data Preprocessing Pipelines

Language: Jupyter Notebook - Size: 993 KB - Last synced at: 9 days ago - Pushed at: about 1 month ago - Stars: 43 - Forks: 14

shamspias/gpt3-data-preprocessing

This repository containing code for preprocessing text data from PDF and DOCX files for use with GPT-3. It includes steps such as tokenization, removal of stop words and punctuation, and formatting for GPT-3 input.

Language: Python - Size: 11.7 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 1

firefly-cpp/succulent

Collect POST requests

Language: Python - Size: 462 KB - Last synced at: 3 days ago - Pushed at: 17 days ago - Stars: 3 - Forks: 3

kolhesamiksha/Nemo_Curator

This repository contains a sample text data-preparation code using Nemo Curator for pre-training or synthetic data generation

Language: Jupyter Notebook - Size: 138 KB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

vuanhngo14/Decision-Tree-from-Scratch

Understand and Implement decision tree

Language: Jupyter Notebook - Size: 148 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

DigitalLifeYZQiu/Data-Process-Library

The data process library to help better industrial data understanding.

Language: Jupyter Notebook - Size: 46.5 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

SaraLittleSquirrel/Obesity-estimator

Project for Machine Learning Data Mining course

Language: Jupyter Notebook - Size: 5.63 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

WillCaton2350/Parallel-ETL-Pipeline

Automated Web Crawler and Data Preprocessing tool written in Python and Scrapy. The Parallel ETL Process involves multiple steps, extracting specific data from a web page using scrapy and organizing it into structured items. Additionally, the extracted data is saved to a separate JSON file for further analysis and integration into an MySQL database

Language: Python - Size: 31.3 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

PrasunDatta/adorsho-praniSheba_Preprocessing-Pipeline-of-Muzzle-Data-of-Cow

This work highlights my contribution as a "ML Engineer" at "adorsho praniSheb"(an ML based agro farming company of Bangladesh) where I was assigned the task of designing the preprocessing pipeline.

Language: Jupyter Notebook - Size: 11.5 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

MustofAhmed41/Data-Preprocessing-using-Distributed-Database

Machine learning models cannot be directly applied to raw data. This desktop application consists of a central server and two client servers. The main servers send raw data to clients, where the data is preprocessed and prepared to be fed to the machine learning model.

Size: 74.2 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

etetteh/production_ml

Language: Python - Size: 35.2 KB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos