GitHub topics: preprocessing

Repositories

akankshaj2712/Vehicle-Insurance-Fraud-Detection

Fraud Detection using Machine Learning

Language: Jupyter Notebook - Size: 4.44 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

anishdeshmukh9/AI-model-Training-Disease-prognosis

this was a academic project that showcase my pre&post ML model knowledge such as, data collection, data preprocessing, AI model training( ML) and finetune the model

Language: Python - Size: 8.13 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

ALebrun-108/BoxSERS

Python package that provides a full range of functionality to process and analyze vibrational spectra (Raman, SERS, FTIR, etc.).

Language: Jupyter Notebook - Size: 20 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 65 - Forks: 15

hscspring/pnlp

NLP预/后处理工具。

Language: Python - Size: 106 KB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 30 - Forks: 6

lucasrla/wsi-preprocessing

Simple library for preprocessing histopathological whole-slide images (WSI) into tiles (a.k.a. patches) towards deep learning

Language: Python - Size: 18.6 KB - Last synced at: 3 days ago - Pushed at: almost 2 years ago - Stars: 55 - Forks: 14

Awais-Asghar/Skin-Cancer-Binary-Classifier

A machine learning project for binary classification of skin cancer as malignant or benign, utilizing models like XGBoost, LGBM Classifier, Adaboost, SVM, and Logistic Regression. Features comprehensive data preprocessing, model training, and evaluation for accurate diagnosis.

Language: Jupyter Notebook - Size: 5.65 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

ankur150/ML-Projects

I applied K-Means clustering on Facebook Live Sellers data to group posts based on engagement metrics. Preprocessing techniques included Label Encoding (for status_type) and Standardization (using StandardScaler for numerical features). Using the Elbow Method, we determined k=3, trained the model, and visualized clusters and centroids.

Language: Jupyter Notebook - Size: 2.33 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

jknafou/TransCorpus

TransCorpus is a scalable toolkit for large-scale, parallel translation and preprocessing of text corpora, built for language model pretraining and research.

Language: Python - Size: 5.91 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

carlosrs14/parallel-data-preprocessig-system

A parallel data preprocessing system using threads and synchronization mechanisms (barrier, busy-waiting, condition variables) to clean and prepare data for AI training.

Language: C - Size: 3.64 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

aryehky/arduino

🚀 C++ Machine Learning Project: Digit Recognition with Support Vector Machine (SVM) 🖥️ This project is a robust implementation of digit recognition using Support Vector Machine (SVM) in C++. The SVM algorithm, a powerful supervised learning technique, is employed to classify handwritten digits from the famous MNIST dataset.

Language: C++ - Size: 188 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

jendives2000/Data_ML_Practice_2025

All projects completed during the Machine Learnia Pro bootcamp

Language: Jupyter Notebook - Size: 271 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

autoreject/autoreject

Automated rejection and repair of bad trials/sensors in M/EEG

Language: Python - Size: 697 KB - Last synced at: 22 days ago - Pushed at: 8 months ago - Stars: 143 - Forks: 58

ThanhNg224/Scrape-Classify

Collected 60,000 Vietnamnet articles, preprocessed the data, and trained a scikit-learn model with over 91% accuracy for article classification.

Language: Jupyter Notebook - Size: 272 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 4 - Forks: 0

l-ramirez-lopez/prospectr

R package: Misc. Functions for Processing and Sample Selection of Spectroscopic Data

Language: R - Size: 17.3 MB - Last synced at: 12 days ago - Pushed at: 4 months ago - Stars: 44 - Forks: 21

NVIDIA-Merlin/NVTabular

NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

Language: Python - Size: 98.4 MB - Last synced at: about 2 months ago - Pushed at: 11 months ago - Stars: 1,084 - Forks: 146

AxeldeRomblay/MLBox

MLBox is a powerful Automated Machine Learning python library.

Language: Python - Size: 50 MB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 1,516 - Forks: 274

StyNW7/Machine_Learning

Provides Basic Machine Learning Code

Language: Jupyter Notebook - Size: 63.5 MB - Last synced at: 21 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

tristan-rech/Applying-CV-and-ML-to-Measuring-Algal-Diversity

A project leveraging FlowCam technology, computer vision, and machine learning to automate the classification of algal cells, aiming to significantly improve the analysis of algal diversity in laboratory settings.

Language: Jupyter Notebook - Size: 106 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

robdahn/primetemp

Segmentation and registration template files to process structural magnetic resonance images from primates in CAT12/SPM12.

Size: 141 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 1

nidhaloff/igel

a delightful machine learning tool that allows you to train, test, and use models without writing code

Language: Python - Size: 18.8 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 3,119 - Forks: 179

MBhatti26/Web-and-Social-Media-Analytics

Reddit-based social media analysis of the r/AskWomen community (5M+ members). Extracted 4,000+ comments using the Reddit API, applied extensive preprocessing (SpaCy, contractions, language filtering), and performed sentiment analysis (VADER), topic modeling (LDA), and text mining to uncover key themes and emotional dynamics within the community.

Language: Jupyter Notebook - Size: 2.98 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

lennymalard/melpy-project

A NumPy-based deep learning library for building neural networks. It features an automatic differentiation engine and supports training models like LSTM, CNN, and FNN.

Language: Python - Size: 159 MB - Last synced at: 16 days ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 0

gnevercodes/Retention_analysis

This is part of our data visualization project where we aim to uncover key factors that effect the retention of students at a university.

Language: Jupyter Notebook - Size: 8.74 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

jubnr/fMRI_quickstart

Template for scalable fMRI workflows: BIDS, DeepPrep, and first-level general linear model (GLM) analysis.

Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

coding-kelps/liaisons-preprocess

collection of scripts for the preprocessing of dataset used for the "liaisons" project.

Language: Jupyter Notebook - Size: 19.5 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

jamal919/pycaz

Collection of functions for data analysis, model input preparation, post-processing, analysis.

Language: Python - Size: 1.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 5 - Forks: 2

dlite-tools/NLPiper

NLPiper is a package that agglomerates different NLP tools and applies their transformations in the target document.

Language: Python - Size: 165 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 19 - Forks: 1

EvaSamoilenko/Monster.com-jobs

Проект по обработке заранее не обработанного Monster.com jobs датасета о вакансиях для многостороннего изучения данных в дальнейшем.

Language: Jupyter Notebook - Size: 167 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

raj-sutariya/indic-num2words

Python library for converting numbers to words for all Indian Languages.

Language: Python - Size: 117 KB - Last synced at: 18 days ago - Pushed at: about 2 months ago - Stars: 35 - Forks: 13

ikegami-yukino/jaconv

Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku

Language: Python - Size: 537 KB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 329 - Forks: 31

felipelapadn/Classificacao-com-Naive-Bayes

Os objetivos deste trabalho são consolidar uma base de dados (dataset), explorar o pré-processamento de texto no contexto de NLP e testar o algoritmo de classificação probabilística Naive Bayes para determinar se o comentário contido na base é negativo ou positivo.

Language: Jupyter Notebook - Size: 245 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

SIMEXP/load_confounds 📦

Load fMRIprep confounds in python

Language: Python - Size: 3.15 MB - Last synced at: 26 days ago - Pushed at: over 3 years ago - Stars: 37 - Forks: 12

stefantaubert/english-text-normalization

Command-line interface (CLI) and library to normalize English texts.

Language: Python - Size: 235 KB - Last synced at: 25 days ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 1

sunlabuiuc/PyHealth

A Deep Learning Python Toolkit for Healthcare Applications.

Language: Python - Size: 121 MB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 1,153 - Forks: 407

AdharshKan42/Annotile

Tile and restitch images and labels for computer vision models.

Language: Python - Size: 903 KB - Last synced at: 25 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

bernardlawes/vision-common

Centralized configuration and utility repo for shared label management, preprocessing, and model metadata shared across repos

Size: 0 Bytes - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

anlijun/awesome-CAE-software

A curated list of awesome CAE frameworks, libraries, and software from a full CAE workflow perspective, including the integration of AI technologies.

Size: 283 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 9 - Forks: 0

bmmunga/abc-customer_engagement_ml

Machine-learning predictive model to analyse customer data, predict engagement likelihood, and surface actionable insights.

Language: Jupyter Notebook - Size: 2.27 MB - Last synced at: 27 days ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

nnseva/solt

Solidity Templating Engine

Language: Solidity - Size: 140 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

TheAhsanFarabi/DataLite

A simplified data mining tool built with Streamlit.

Language: Python - Size: 30.3 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

DanielFaltynowski/song-analysis-system

System analizujący teksty piosenek z wykorzystaniem algorytmów uczenia maszynowego w celu ich klasyfikacji tematycznej.

Language: Jupyter Notebook - Size: 40.5 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

ropensci/MODIStsp

An "R" package for automatic download and preprocessing of MODIS Land Products Time Series

Language: R - Size: 180 MB - Last synced at: 2 days ago - Pushed at: 2 months ago - Stars: 156 - Forks: 52

andrei-vataselu/data-science-snippets

🧰 Essential EDA and Data Cleaning Helpers for Any DataFrame This collection of functions is designed to accelerate exploratory data analysis (EDA), quickly surface data quality issues, and offer high-level insights into the structure and content of your dataset.

Language: Python - Size: 30.3 KB - Last synced at: 4 days ago - Pushed at: 2 months ago - Stars: 2 - Forks: 2

ty70/text_preprocessing_tools

A set of Python tools for preprocessing Japanese text for subtitles or speech synthesis (e.g., ruby removal, kanji stripping).

Language: Python - Size: 17.6 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

lokesh9899/Mortality-Risk-Readmission-Prediction-NLP-Clinical-Bert-LLM

AI-powered prediction of in-hospital mortality and 30-day readmission using MIMIC-III clinical data. Combines structured features and ClinicalBERT embeddings with XGBoost/CatBoost Best Performance models for accurate, explainable healthcare forecasting.

Language: Jupyter Notebook - Size: 1.94 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

xariska3/Apartments-Classification-Regression.github.io

U.S. Apartments Rental Prediction and Classification with ML

Language: Jupyter Notebook - Size: 6.44 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

hypatia-of-sva/cm

cm - C Macro processor

Language: C - Size: 41 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

pblaney/mgp1000

Nextflow bioinformatics pipeline for large-scale analysis of Multiple Myeloma genomes

Language: Nextflow - Size: 324 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 11 - Forks: 5

inbo/n2khab-preprocessing

Broadly useful data preparation for Flemish Natura 2000 habitat analyses

Language: R - Size: 787 KB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 1 - Forks: 0

niklaswais/gesp

Language: Python - Size: 190 KB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 24 - Forks: 5

exponentialR/QUB-HRI

Preprocessing Repository of QUB-Perception of Human Enagagement in Assembly Operations Dataset

Language: Python - Size: 91 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

TNO-Quantum/optimization.qubo.preprocessors

Preprocessor QUBO optimization

Language: Python - Size: 0 Bytes - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

pawlyk/dsml-tools

set of Data Science and Machine Learning tools

Language: Python - Size: 261 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

Jayplect/Funding-recommendation-engine

For this project, I built a binary classifier to predict the success of applicants seeking funding from Alphabet Soup. Leveraging the features in the dataset, the model uses machine learning and neural networks to make accurate predictions.

Language: Jupyter Notebook - Size: 110 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

obtic-sorbonne/Toolbox-site

Pandore offers a set of tools that facilitate the most common corpus processing tasks for digital humanities research. Automatic pipelines for a set of tasks are also available

Language: HTML - Size: 78.2 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 2 - Forks: 1

FareedKhan-dev/Most-powerful-NLP-library

Gemini, as capable as GPT-4, provides a free API with limited access. I tested it with the help of prompt engineering and found that it can solve almost any NLP task you want to tackle.

Language: Jupyter Notebook - Size: 107 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 35 - Forks: 9

karakatic/EvoPreprocess

A Python Toolkit for Data Preprocessing with Evolutionary and Nature-Inspired Algorithms.

Language: Python - Size: 129 KB - Last synced at: 25 days ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 3

Yalai92/CAVA_IMP_EXP_ANALYSIS

Analysis, visualization, preprocessing and clustering of global sparkling wine trade (2017–2024) using Python in Colab and ML to reveal trends and country profiles.

Language: Jupyter Notebook - Size: 2.71 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

HanBnrd/NIRSimple

fNIRS signal processing simplified

Language: Python - Size: 3.02 MB - Last synced at: 24 days ago - Pushed at: about 1 year ago - Stars: 15 - Forks: 2

stanstrup/QC4Metabolomics

QC systems for metabolomics studies

Language: R - Size: 351 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 10 - Forks: 0

AlexChristensen/SemNetCleaner

An Automated Cleaning Tool for Semantic and Linguistic Data

Language: HTML - Size: 978 KB - Last synced at: 6 days ago - Pushed at: 2 months ago - Stars: 10 - Forks: 0

lucasrla/wsi-tile-cleanup

Image filters for digital pathology: detect pen marks, background, and artifacts. Use them for preprocessing towards deep learning

Language: Python - Size: 3.6 MB - Last synced at: 3 days ago - Pushed at: almost 5 years ago - Stars: 28 - Forks: 4

Hyland/DocumentFilters

Document Filters is an SDK for applications like content indexing, e-discovery, data migration, and feeding data into AI/ML models by extracting data from unstructured sources. It gives the ability to perform deep inspection, data extraction, output manipulation, and conversion for virtually any type of document, in any programming language.

Language: C++ - Size: 62.4 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 25 - Forks: 2

Shr-reny/MLFoundry

MLFoundry is an end-to-end machine learning project template that demonstrates a production-grade ML pipeline using modular code, configuration management, version control, and deployment readiness.

Language: Jupyter Notebook - Size: 1.7 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Maoelan/amazon-automated-preprocessing

This repository is used to automate the preprocessing of data scraped from Amazon reviews on the Google Play Store.

Language: Jupyter Notebook - Size: 2.88 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

abdul-rafay19/YoungDevInterns_Machine-Learning_Tasks

This internship offers hands-on exposure to real-world Machine Learning applications — from data visualization and preprocessing to model development, evaluation, and deployment. It focuses on real ML workflows, problem-solving, neural networks, and hyperparameter tuning — all within a collaborative, remote, and growth-oriented environment.

Language: Jupyter Notebook - Size: 192 KB - Last synced at: 21 days ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

githubharald/DeslantImg

The deslanting algorithm sets text upright in images. Python, C++ and OpenCL implementations provided.

Language: C++ - Size: 591 KB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 150 - Forks: 38

smoia/EuskalIBUR_preproc

Preprocessing files for EuskalIBUR

Language: Shell - Size: 113 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 2

ag-ds-bubble/swtloc

Python package for Stroke Width Transform - Localizing the Text (Letters & Words) in a Natural Image

Language: Python - Size: 126 MB - Last synced at: 18 days ago - Pushed at: almost 2 years ago - Stars: 38 - Forks: 4

habeeb3579/Spectoprep

Language: Jupyter Notebook - Size: 3.36 MB - Last synced at: 8 days ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Yu-Group/veridical-flow

Making it easier to build stable, trustworthy data-science pipelines based on the PCS framework.

Language: Jupyter Notebook - Size: 13.4 MB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 71 - Forks: 7

shubham5027/Store-Item-Demand-Forcasting

The "Sales Demand Forecasting Regression Model" project aims to develop a predictive model that forecasts future sales demand based on historical data and relevant influencing factors. The project follows a structured approach, encompassing data collection, preprocessing, model selection, training, evaluation, and deployment.

Language: Jupyter Notebook - Size: 5.44 MB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 0

sposso/fNIRS-preprocessing-guide

Preprocessing the fNIRS data from the paper "The use of broad vs restricted regions of interest in functional near-infrared spectroscopy for measuring cortical activation to auditory-only and visual-only speech"

Language: Jupyter Notebook - Size: 29.3 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

0xferit/ITU-Turkish-NLP-Pipeline-Caller 📦

A Python3 wrapper tool to help using ITU Turkish NLP Pipeline API -- UNMAINTAINED --

Language: Python - Size: 131 KB - Last synced at: 11 days ago - Pushed at: about 7 years ago - Stars: 44 - Forks: 9

parvvaresh/ETL-news

Language: HTML - Size: 4.76 MB - Last synced at: 5 days ago - Pushed at: 11 months ago - Stars: 20 - Forks: 1

NiranjanRao07/ADHD-ML-Project

This project used machine learning to classify ADHD based on EEG data. We preprocessed the EEG signals, extracted various features, and used LDA for dimensionality reduction. A voting ensemble of classifiers achieved 72% accuracy in distinguishing between ADHD and control groups.

Language: Jupyter Notebook - Size: 1.35 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

calvinmccarter/kditransform

Kernel density integral transformation: feature preprocessing and univariate clustering (TMLR, 2023)

Language: Python - Size: 15.4 MB - Last synced at: 27 days ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 0

erdogant/df2onehot

Convert a unstructured array into a stuctured dataframe.

Language: Python - Size: 6.67 MB - Last synced at: 22 days ago - Pushed at: 3 months ago - Stars: 3 - Forks: 1

Aura-healthcare/ecg_qc

A library to compute ECG signal quality indicators

Language: Jupyter Notebook - Size: 50.4 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 41 - Forks: 10

bids-apps/freesurfer

BIDS app wrapping recon-all from FreeSurfer

Language: Python - Size: 221 KB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 41 - Forks: 35

Metalkiler/Cane-Categorical-Attribute-traNsformation-Environment

A simple preprocessing method for Machine Learning

Language: Python - Size: 542 KB - Last synced at: 18 days ago - Pushed at: 3 months ago - Stars: 4 - Forks: 3

martinezmario02/ClasificacionDiabetes

Pre-procesamiento de datos y clasificación binaria (2025)

Language: Jupyter Notebook - Size: 1.8 MB - Last synced at: 21 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Ehsan-Behzadi/Online-Retail-Data-Analysis-and-Preprocessing

This project analyzes and preprocesses the Online Retail dataset to uncover insights into customer purchasing behaviors, sales trends, and product performance. It includes data cleaning, exploration, and visualization, with the goal of enhancing understanding of online retail dynamics.

Language: Jupyter Notebook - Size: 1.86 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

gtkacz/undergrad_thesis

Code for my undergraduate thesis: Quantitative Analysis of the Impact of Image Pre-Processing on the Accuracy of Computer Vision Models Trained to Identify Dermatological Skin Diseases

Language: Jupyter Notebook - Size: 2.95 GB - Last synced at: 9 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

R1j1t/contextualSpellCheck

✔️Contextual word checker for better suggestions (not actively maintained)

Language: Python - Size: 2.45 MB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 413 - Forks: 64

scythemenace/NormalizeText

A text preprocessing module demonstrated on a Project Gutenberg dataset, with methodology and results fully documented.

Language: Python - Size: 1.21 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

xwxfox/convokit

A flexible TypeScript framework for ingesting, processing, and exporting chat/conversation data for LLM training and analysis.

Language: TypeScript - Size: 116 KB - Last synced at: 24 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

damianhorna/multi-imbalance

Python package for tackling multi-class imbalance problems. http://www.cs.put.poznan.pl/mlango/publications/multiimbalance/

Language: Python - Size: 66 MB - Last synced at: 27 days ago - Pushed at: about 1 year ago - Stars: 78 - Forks: 12

fitushar/Brain-Tissue-Segmentation-Using-Deep-Learning-Pipeline-NeuroNet

This Repository is for the MISA Course final project which was Brain tissue segmentation. we adopt NeuroNet which is a comprehensive brain image segmentation tool based on a novel multi-output CNN architecture which has been trained and tuned using IBSR18 dataset

Language: Jupyter Notebook - Size: 5.16 MB - Last synced at: 3 months ago - Pushed at: about 5 years ago - Stars: 35 - Forks: 9

steviecurran/house-prices

House Price Predictions

Language: Jupyter Notebook - Size: 255 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

MahyadSaedpanah/ML---Pima-Indian-Diabetes

Pima Indian Diabetes Dataset Analysis and Clustering using Machine Learning techniques.

Language: Jupyter Notebook - Size: 3.07 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

abu14/transaction-anomaly-detection

An end to end fraud detection model to identify anomalies within transactions.

Language: Jupyter Notebook - Size: 420 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

SegataLab/preprocessing

Raw sequence metagenomic reads pre-processing: trimming, QC, and host contamination

Language: Python - Size: 145 KB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 11 - Forks: 2

birddevelper/ScannedDocumentPreprocessing

Scanned document preprocessing python snippet code

Language: Python - Size: 902 KB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 6 - Forks: 0

pni-lab/PUMI

PUMI: neuroimaging Pipelines Using Modular workflow Integration

Language: Python - Size: 34.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

paulross/cpip

CPIP - a C/C++ preprocessor implemented in Python.

Language: Python - Size: 36.7 MB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 46 - Forks: 4

mohameddsalmann/Anime-Recommendation-System

This repository contains a Jupyter notebook for building an anime recommendation system using various machine learning models. The notebook includes steps for data preprocessing, feature extraction, model training, and creating a user-friendly graphical user interface (GUI) with tkinter.

Language: Jupyter Notebook - Size: 21.5 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

saifalibaig/Multi-Label-Emotion-Recognition

This project focuses on detecting multiple emotions from English text using a fine-tuned **BERT** model. It leverages the [GoEmotions](https://huggingface.co/datasets/go_emotions) dataset — a large-scale human-annotated dataset of Reddit comments labeled with 27 emotions + neutral.

Language: Jupyter Notebook - Size: 122 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

xga0/lightlemma

A lightweight, fast English lemmatizer

Language: Python - Size: 11.7 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

Melikarzyt/Processing-GPS-Data

“A data cleaning practice project focused on missing GPS data and outlier detection using IQR.“

Size: 6.84 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Related Keywords

preprocessing 1,456 machine-learning 435 python 404 data-science 172 nlp 158 pandas 124 deep-learning 108 classification 105 numpy 75 data 75 data-visualization 74 data-analysis 71 python3 68 sklearn 64 eda 62 natural-language-processing 58 logistic-regression 57 visualization 56 tensorflow 56 dataset 56 feature-engineering 56 linear-regression 56 exploratory-data-analysis 55 random-forest 53 machine-learning-algorithms 51 matplotlib 51 scikit-learn 50 data-cleaning 49 clustering 46 data-mining 45 regression 44 jupyter-notebook 43 seaborn 41 sentiment-analysis 39 image-processing 38 keras 38 neural-network 36 pytorch 35 nltk 34 pipeline 34 r 32 feature-extraction 32 svm 31 analysis 30 neural-networks 29 computer-vision 28 ml 28 cnn 27 preprocessor 26 decision-trees 25 artificial-intelligence 25 supervised-learning 25 svm-classifier 24 datascience 24 xgboost 23 prediction 22 nlp-machine-learning 22 ai 22 time-series 21 pca 21 feature-selection 21 predictive-modeling 21 streamlit 21 normalization 20 kaggle 20 tf-idf 20 statistics 19 eeg 19 text-classification 19 naive-bayes-classifier 19 knn 18 text-processing 18 knn-classification 18 preprocessing-data 18 word2vec 17 opencv 17 pca-analysis 17 text 17 regression-models 16 lemmatization 16 confusion-matrix 16 kmeans-clustering 16 tokenization 16 datacleaning 16 java 16 tokenizer 15 text-mining 15 postprocessing 15 data-preprocessing 15 mri 15 css 15 neuroimaging 15 outlier-detection 15 html 14 random-forest-classifier 14 matplotlib-pyplot 14 cross-validation 14 dimensionality-reduction 14 weka 13 kmeans 13