GitHub topics: preprocessing
akankshaj2712/Vehicle-Insurance-Fraud-Detection
Fraud Detection using Machine Learning
Language: Jupyter Notebook - Size: 4.44 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

anishdeshmukh9/AI-model-Training-Disease-prognosis
this was a academic project that showcase my pre&post ML model knowledge such as, data collection, data preprocessing, AI model training( ML) and finetune the model
Language: Python - Size: 8.13 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

ALebrun-108/BoxSERS
Python package that provides a full range of functionality to process and analyze vibrational spectra (Raman, SERS, FTIR, etc.).
Language: Jupyter Notebook - Size: 20 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 65 - Forks: 15

hscspring/pnlp
NLP预/后处理工具。
Language: Python - Size: 106 KB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 30 - Forks: 6

lucasrla/wsi-preprocessing
Simple library for preprocessing histopathological whole-slide images (WSI) into tiles (a.k.a. patches) towards deep learning
Language: Python - Size: 18.6 KB - Last synced at: 3 days ago - Pushed at: almost 2 years ago - Stars: 55 - Forks: 14

Awais-Asghar/Skin-Cancer-Binary-Classifier
A machine learning project for binary classification of skin cancer as malignant or benign, utilizing models like XGBoost, LGBM Classifier, Adaboost, SVM, and Logistic Regression. Features comprehensive data preprocessing, model training, and evaluation for accurate diagnosis.
Language: Jupyter Notebook - Size: 5.65 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

ankur150/ML-Projects
I applied K-Means clustering on Facebook Live Sellers data to group posts based on engagement metrics. Preprocessing techniques included Label Encoding (for status_type) and Standardization (using StandardScaler for numerical features). Using the Elbow Method, we determined k=3, trained the model, and visualized clusters and centroids.
Language: Jupyter Notebook - Size: 2.33 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

jknafou/TransCorpus
TransCorpus is a scalable toolkit for large-scale, parallel translation and preprocessing of text corpora, built for language model pretraining and research.
Language: Python - Size: 5.91 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

carlosrs14/parallel-data-preprocessig-system
A parallel data preprocessing system using threads and synchronization mechanisms (barrier, busy-waiting, condition variables) to clean and prepare data for AI training.
Language: C - Size: 3.64 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

aryehky/arduino
🚀 C++ Machine Learning Project: Digit Recognition with Support Vector Machine (SVM) 🖥️ This project is a robust implementation of digit recognition using Support Vector Machine (SVM) in C++. The SVM algorithm, a powerful supervised learning technique, is employed to classify handwritten digits from the famous MNIST dataset.
Language: C++ - Size: 188 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

jendives2000/Data_ML_Practice_2025
All projects completed during the Machine Learnia Pro bootcamp
Language: Jupyter Notebook - Size: 271 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

autoreject/autoreject
Automated rejection and repair of bad trials/sensors in M/EEG
Language: Python - Size: 697 KB - Last synced at: 22 days ago - Pushed at: 8 months ago - Stars: 143 - Forks: 58

ThanhNg224/Scrape-Classify
Collected 60,000 Vietnamnet articles, preprocessed the data, and trained a scikit-learn model with over 91% accuracy for article classification.
Language: Jupyter Notebook - Size: 272 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 4 - Forks: 0

l-ramirez-lopez/prospectr
R package: Misc. Functions for Processing and Sample Selection of Spectroscopic Data
Language: R - Size: 17.3 MB - Last synced at: 12 days ago - Pushed at: 4 months ago - Stars: 44 - Forks: 21

NVIDIA-Merlin/NVTabular
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
Language: Python - Size: 98.4 MB - Last synced at: about 2 months ago - Pushed at: 11 months ago - Stars: 1,084 - Forks: 146

AxeldeRomblay/MLBox
MLBox is a powerful Automated Machine Learning python library.
Language: Python - Size: 50 MB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 1,516 - Forks: 274

StyNW7/Machine_Learning
Provides Basic Machine Learning Code
Language: Jupyter Notebook - Size: 63.5 MB - Last synced at: 21 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

tristan-rech/Applying-CV-and-ML-to-Measuring-Algal-Diversity
A project leveraging FlowCam technology, computer vision, and machine learning to automate the classification of algal cells, aiming to significantly improve the analysis of algal diversity in laboratory settings.
Language: Jupyter Notebook - Size: 106 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

robdahn/primetemp
Segmentation and registration template files to process structural magnetic resonance images from primates in CAT12/SPM12.
Size: 141 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 1

nidhaloff/igel
a delightful machine learning tool that allows you to train, test, and use models without writing code
Language: Python - Size: 18.8 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 3,119 - Forks: 179

MBhatti26/Web-and-Social-Media-Analytics
Reddit-based social media analysis of the r/AskWomen community (5M+ members). Extracted 4,000+ comments using the Reddit API, applied extensive preprocessing (SpaCy, contractions, language filtering), and performed sentiment analysis (VADER), topic modeling (LDA), and text mining to uncover key themes and emotional dynamics within the community.
Language: Jupyter Notebook - Size: 2.98 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

lennymalard/melpy-project
A NumPy-based deep learning library for building neural networks. It features an automatic differentiation engine and supports training models like LSTM, CNN, and FNN.
Language: Python - Size: 159 MB - Last synced at: 16 days ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 0

gnevercodes/Retention_analysis
This is part of our data visualization project where we aim to uncover key factors that effect the retention of students at a university.
Language: Jupyter Notebook - Size: 8.74 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

jubnr/fMRI_quickstart
Template for scalable fMRI workflows: BIDS, DeepPrep, and first-level general linear model (GLM) analysis.
Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

coding-kelps/liaisons-preprocess
collection of scripts for the preprocessing of dataset used for the "liaisons" project.
Language: Jupyter Notebook - Size: 19.5 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

jamal919/pycaz
Collection of functions for data analysis, model input preparation, post-processing, analysis.
Language: Python - Size: 1.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 5 - Forks: 2

dlite-tools/NLPiper
NLPiper is a package that agglomerates different NLP tools and applies their transformations in the target document.
Language: Python - Size: 165 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 19 - Forks: 1

EvaSamoilenko/Monster.com-jobs
Проект по обработке заранее не обработанного Monster.com jobs датасета о вакансиях для многостороннего изучения данных в дальнейшем.
Language: Jupyter Notebook - Size: 167 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

raj-sutariya/indic-num2words
Python library for converting numbers to words for all Indian Languages.
Language: Python - Size: 117 KB - Last synced at: 18 days ago - Pushed at: about 2 months ago - Stars: 35 - Forks: 13

ikegami-yukino/jaconv
Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku
Language: Python - Size: 537 KB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 329 - Forks: 31

felipelapadn/Classificacao-com-Naive-Bayes
Os objetivos deste trabalho são consolidar uma base de dados (dataset), explorar o pré-processamento de texto no contexto de NLP e testar o algoritmo de classificação probabilística Naive Bayes para determinar se o comentário contido na base é negativo ou positivo.
Language: Jupyter Notebook - Size: 245 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

SIMEXP/load_confounds 📦
Load fMRIprep confounds in python
Language: Python - Size: 3.15 MB - Last synced at: 26 days ago - Pushed at: over 3 years ago - Stars: 37 - Forks: 12

stefantaubert/english-text-normalization
Command-line interface (CLI) and library to normalize English texts.
Language: Python - Size: 235 KB - Last synced at: 25 days ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 1

sunlabuiuc/PyHealth
A Deep Learning Python Toolkit for Healthcare Applications.
Language: Python - Size: 121 MB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 1,153 - Forks: 407

AdharshKan42/Annotile
Tile and restitch images and labels for computer vision models.
Language: Python - Size: 903 KB - Last synced at: 25 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

bernardlawes/vision-common
Centralized configuration and utility repo for shared label management, preprocessing, and model metadata shared across repos
Size: 0 Bytes - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

anlijun/awesome-CAE-software
A curated list of awesome CAE frameworks, libraries, and software from a full CAE workflow perspective, including the integration of AI technologies.
Size: 283 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 9 - Forks: 0

bmmunga/abc-customer_engagement_ml
Machine-learning predictive model to analyse customer data, predict engagement likelihood, and surface actionable insights.
Language: Jupyter Notebook - Size: 2.27 MB - Last synced at: 27 days ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

nnseva/solt
Solidity Templating Engine
Language: Solidity - Size: 140 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

TheAhsanFarabi/DataLite
A simplified data mining tool built with Streamlit.
Language: Python - Size: 30.3 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

DanielFaltynowski/song-analysis-system
System analizujący teksty piosenek z wykorzystaniem algorytmów uczenia maszynowego w celu ich klasyfikacji tematycznej.
Language: Jupyter Notebook - Size: 40.5 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

ropensci/MODIStsp
An "R" package for automatic download and preprocessing of MODIS Land Products Time Series
Language: R - Size: 180 MB - Last synced at: 2 days ago - Pushed at: 2 months ago - Stars: 156 - Forks: 52

andrei-vataselu/data-science-snippets
🧰 Essential EDA and Data Cleaning Helpers for Any DataFrame This collection of functions is designed to accelerate exploratory data analysis (EDA), quickly surface data quality issues, and offer high-level insights into the structure and content of your dataset.
Language: Python - Size: 30.3 KB - Last synced at: 4 days ago - Pushed at: 2 months ago - Stars: 2 - Forks: 2

ty70/text_preprocessing_tools
A set of Python tools for preprocessing Japanese text for subtitles or speech synthesis (e.g., ruby removal, kanji stripping).
Language: Python - Size: 17.6 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

lokesh9899/Mortality-Risk-Readmission-Prediction-NLP-Clinical-Bert-LLM
AI-powered prediction of in-hospital mortality and 30-day readmission using MIMIC-III clinical data. Combines structured features and ClinicalBERT embeddings with XGBoost/CatBoost Best Performance models for accurate, explainable healthcare forecasting.
Language: Jupyter Notebook - Size: 1.94 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

xariska3/Apartments-Classification-Regression.github.io
U.S. Apartments Rental Prediction and Classification with ML
Language: Jupyter Notebook - Size: 6.44 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

hypatia-of-sva/cm
cm - C Macro processor
Language: C - Size: 41 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

pblaney/mgp1000
Nextflow bioinformatics pipeline for large-scale analysis of Multiple Myeloma genomes
Language: Nextflow - Size: 324 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 11 - Forks: 5

inbo/n2khab-preprocessing
Broadly useful data preparation for Flemish Natura 2000 habitat analyses
Language: R - Size: 787 KB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 1 - Forks: 0

niklaswais/gesp
Language: Python - Size: 190 KB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 24 - Forks: 5

exponentialR/QUB-HRI
Preprocessing Repository of QUB-Perception of Human Enagagement in Assembly Operations Dataset
Language: Python - Size: 91 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

TNO-Quantum/optimization.qubo.preprocessors
Preprocessor QUBO optimization
Language: Python - Size: 0 Bytes - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

pawlyk/dsml-tools
set of Data Science and Machine Learning tools
Language: Python - Size: 261 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

Jayplect/Funding-recommendation-engine
For this project, I built a binary classifier to predict the success of applicants seeking funding from Alphabet Soup. Leveraging the features in the dataset, the model uses machine learning and neural networks to make accurate predictions.
Language: Jupyter Notebook - Size: 110 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

obtic-sorbonne/Toolbox-site
Pandore offers a set of tools that facilitate the most common corpus processing tasks for digital humanities research. Automatic pipelines for a set of tasks are also available
Language: HTML - Size: 78.2 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 2 - Forks: 1

FareedKhan-dev/Most-powerful-NLP-library
Gemini, as capable as GPT-4, provides a free API with limited access. I tested it with the help of prompt engineering and found that it can solve almost any NLP task you want to tackle.
Language: Jupyter Notebook - Size: 107 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 35 - Forks: 9

karakatic/EvoPreprocess
A Python Toolkit for Data Preprocessing with Evolutionary and Nature-Inspired Algorithms.
Language: Python - Size: 129 KB - Last synced at: 25 days ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 3

Yalai92/CAVA_IMP_EXP_ANALYSIS
Analysis, visualization, preprocessing and clustering of global sparkling wine trade (2017–2024) using Python in Colab and ML to reveal trends and country profiles.
Language: Jupyter Notebook - Size: 2.71 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

HanBnrd/NIRSimple
fNIRS signal processing simplified
Language: Python - Size: 3.02 MB - Last synced at: 24 days ago - Pushed at: about 1 year ago - Stars: 15 - Forks: 2

stanstrup/QC4Metabolomics
QC systems for metabolomics studies
Language: R - Size: 351 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 10 - Forks: 0

AlexChristensen/SemNetCleaner
An Automated Cleaning Tool for Semantic and Linguistic Data
Language: HTML - Size: 978 KB - Last synced at: 6 days ago - Pushed at: 2 months ago - Stars: 10 - Forks: 0

lucasrla/wsi-tile-cleanup
Image filters for digital pathology: detect pen marks, background, and artifacts. Use them for preprocessing towards deep learning
Language: Python - Size: 3.6 MB - Last synced at: 3 days ago - Pushed at: almost 5 years ago - Stars: 28 - Forks: 4

Hyland/DocumentFilters
Document Filters is an SDK for applications like content indexing, e-discovery, data migration, and feeding data into AI/ML models by extracting data from unstructured sources. It gives the ability to perform deep inspection, data extraction, output manipulation, and conversion for virtually any type of document, in any programming language.
Language: C++ - Size: 62.4 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 25 - Forks: 2

Shr-reny/MLFoundry
MLFoundry is an end-to-end machine learning project template that demonstrates a production-grade ML pipeline using modular code, configuration management, version control, and deployment readiness.
Language: Jupyter Notebook - Size: 1.7 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Maoelan/amazon-automated-preprocessing
This repository is used to automate the preprocessing of data scraped from Amazon reviews on the Google Play Store.
Language: Jupyter Notebook - Size: 2.88 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

abdul-rafay19/YoungDevInterns_Machine-Learning_Tasks
This internship offers hands-on exposure to real-world Machine Learning applications — from data visualization and preprocessing to model development, evaluation, and deployment. It focuses on real ML workflows, problem-solving, neural networks, and hyperparameter tuning — all within a collaborative, remote, and growth-oriented environment.
Language: Jupyter Notebook - Size: 192 KB - Last synced at: 21 days ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

githubharald/DeslantImg
The deslanting algorithm sets text upright in images. Python, C++ and OpenCL implementations provided.
Language: C++ - Size: 591 KB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 150 - Forks: 38

smoia/EuskalIBUR_preproc
Preprocessing files for EuskalIBUR
Language: Shell - Size: 113 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 2

ag-ds-bubble/swtloc
Python package for Stroke Width Transform - Localizing the Text (Letters & Words) in a Natural Image
Language: Python - Size: 126 MB - Last synced at: 18 days ago - Pushed at: almost 2 years ago - Stars: 38 - Forks: 4

habeeb3579/Spectoprep
Language: Jupyter Notebook - Size: 3.36 MB - Last synced at: 8 days ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Yu-Group/veridical-flow
Making it easier to build stable, trustworthy data-science pipelines based on the PCS framework.
Language: Jupyter Notebook - Size: 13.4 MB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 71 - Forks: 7

shubham5027/Store-Item-Demand-Forcasting
The "Sales Demand Forecasting Regression Model" project aims to develop a predictive model that forecasts future sales demand based on historical data and relevant influencing factors. The project follows a structured approach, encompassing data collection, preprocessing, model selection, training, evaluation, and deployment.
Language: Jupyter Notebook - Size: 5.44 MB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 0

sposso/fNIRS-preprocessing-guide
Preprocessing the fNIRS data from the paper "The use of broad vs restricted regions of interest in functional near-infrared spectroscopy for measuring cortical activation to auditory-only and visual-only speech"
Language: Jupyter Notebook - Size: 29.3 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

0xferit/ITU-Turkish-NLP-Pipeline-Caller 📦
A Python3 wrapper tool to help using ITU Turkish NLP Pipeline API -- UNMAINTAINED --
Language: Python - Size: 131 KB - Last synced at: 11 days ago - Pushed at: about 7 years ago - Stars: 44 - Forks: 9

parvvaresh/ETL-news
Language: HTML - Size: 4.76 MB - Last synced at: 5 days ago - Pushed at: 11 months ago - Stars: 20 - Forks: 1

NiranjanRao07/ADHD-ML-Project
This project used machine learning to classify ADHD based on EEG data. We preprocessed the EEG signals, extracted various features, and used LDA for dimensionality reduction. A voting ensemble of classifiers achieved 72% accuracy in distinguishing between ADHD and control groups.
Language: Jupyter Notebook - Size: 1.35 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

calvinmccarter/kditransform
Kernel density integral transformation: feature preprocessing and univariate clustering (TMLR, 2023)
Language: Python - Size: 15.4 MB - Last synced at: 27 days ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 0

erdogant/df2onehot
Convert a unstructured array into a stuctured dataframe.
Language: Python - Size: 6.67 MB - Last synced at: 22 days ago - Pushed at: 3 months ago - Stars: 3 - Forks: 1

Aura-healthcare/ecg_qc
A library to compute ECG signal quality indicators
Language: Jupyter Notebook - Size: 50.4 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 41 - Forks: 10

bids-apps/freesurfer
BIDS app wrapping recon-all from FreeSurfer
Language: Python - Size: 221 KB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 41 - Forks: 35

Metalkiler/Cane-Categorical-Attribute-traNsformation-Environment
A simple preprocessing method for Machine Learning
Language: Python - Size: 542 KB - Last synced at: 18 days ago - Pushed at: 3 months ago - Stars: 4 - Forks: 3

martinezmario02/ClasificacionDiabetes
Pre-procesamiento de datos y clasificación binaria (2025)
Language: Jupyter Notebook - Size: 1.8 MB - Last synced at: 21 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Ehsan-Behzadi/Online-Retail-Data-Analysis-and-Preprocessing
This project analyzes and preprocesses the Online Retail dataset to uncover insights into customer purchasing behaviors, sales trends, and product performance. It includes data cleaning, exploration, and visualization, with the goal of enhancing understanding of online retail dynamics.
Language: Jupyter Notebook - Size: 1.86 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

gtkacz/undergrad_thesis
Code for my undergraduate thesis: Quantitative Analysis of the Impact of Image Pre-Processing on the Accuracy of Computer Vision Models Trained to Identify Dermatological Skin Diseases
Language: Jupyter Notebook - Size: 2.95 GB - Last synced at: 9 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

R1j1t/contextualSpellCheck
✔️Contextual word checker for better suggestions (not actively maintained)
Language: Python - Size: 2.45 MB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 413 - Forks: 64

scythemenace/NormalizeText
A text preprocessing module demonstrated on a Project Gutenberg dataset, with methodology and results fully documented.
Language: Python - Size: 1.21 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

xwxfox/convokit
A flexible TypeScript framework for ingesting, processing, and exporting chat/conversation data for LLM training and analysis.
Language: TypeScript - Size: 116 KB - Last synced at: 24 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

damianhorna/multi-imbalance
Python package for tackling multi-class imbalance problems. http://www.cs.put.poznan.pl/mlango/publications/multiimbalance/
Language: Python - Size: 66 MB - Last synced at: 27 days ago - Pushed at: about 1 year ago - Stars: 78 - Forks: 12

fitushar/Brain-Tissue-Segmentation-Using-Deep-Learning-Pipeline-NeuroNet
This Repository is for the MISA Course final project which was Brain tissue segmentation. we adopt NeuroNet which is a comprehensive brain image segmentation tool based on a novel multi-output CNN architecture which has been trained and tuned using IBSR18 dataset
Language: Jupyter Notebook - Size: 5.16 MB - Last synced at: 3 months ago - Pushed at: about 5 years ago - Stars: 35 - Forks: 9

steviecurran/house-prices
House Price Predictions
Language: Jupyter Notebook - Size: 255 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

MahyadSaedpanah/ML---Pima-Indian-Diabetes
Pima Indian Diabetes Dataset Analysis and Clustering using Machine Learning techniques.
Language: Jupyter Notebook - Size: 3.07 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

abu14/transaction-anomaly-detection
An end to end fraud detection model to identify anomalies within transactions.
Language: Jupyter Notebook - Size: 420 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

SegataLab/preprocessing
Raw sequence metagenomic reads pre-processing: trimming, QC, and host contamination
Language: Python - Size: 145 KB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 11 - Forks: 2

birddevelper/ScannedDocumentPreprocessing
Scanned document preprocessing python snippet code
Language: Python - Size: 902 KB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 6 - Forks: 0

pni-lab/PUMI
PUMI: neuroimaging Pipelines Using Modular workflow Integration
Language: Python - Size: 34.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

paulross/cpip
CPIP - a C/C++ preprocessor implemented in Python.
Language: Python - Size: 36.7 MB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 46 - Forks: 4

mohameddsalmann/Anime-Recommendation-System
This repository contains a Jupyter notebook for building an anime recommendation system using various machine learning models. The notebook includes steps for data preprocessing, feature extraction, model training, and creating a user-friendly graphical user interface (GUI) with tkinter.
Language: Jupyter Notebook - Size: 21.5 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

saifalibaig/Multi-Label-Emotion-Recognition
This project focuses on detecting multiple emotions from English text using a fine-tuned **BERT** model. It leverages the [GoEmotions](https://huggingface.co/datasets/go_emotions) dataset — a large-scale human-annotated dataset of Reddit comments labeled with 27 emotions + neutral.
Language: Jupyter Notebook - Size: 122 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

xga0/lightlemma
A lightweight, fast English lemmatizer
Language: Python - Size: 11.7 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

Melikarzyt/Processing-GPS-Data
“A data cleaning practice project focused on missing GPS data and outlier detection using IQR.“
Size: 6.84 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0
