GitHub topics: preprocessing
PylarBear/pybear
pybear is a Python computing library that augments data analytics functionality found in popular packages that use the scikit-learn API, such as scikit-learn and xgboost.
Language: Python - Size: 49.9 MB - Last synced at: about 20 hours ago - Pushed at: about 20 hours ago - Stars: 0 - Forks: 0

ShrutiSemwal/MTech.-Dissertation-UNet-with-Attention-Mechanism
UNet with Attention-DL Model for green building domain
Size: 90.8 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

xga0/contraction_fix
A fast and efficient library for fixing contractions in text
Language: Python - Size: 33.2 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 3 - Forks: 1

xga0/emoticon_fix
A lightweight and efficient library for transforming emoticons into their semantic meanings
Language: Python - Size: 71.3 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 4 - Forks: 1

Saeed-dev2/poultry_Form_Disease_Deep-Learning-Machine-Learning-Computer-vision
An AI-powered poultry disease detection system that uses deep learning to classify chicken feces images. Built with TensorFlow and Streamlit, it provides quick, accessible diagnosis for farmers. Trained on PCR-validated data for accurate classification of diseases like Salmonella and Coccidiosis.
Language: Jupyter Notebook - Size: 13.7 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0

methlabUZH/automagic
Automagic
Language: MATLAB - Size: 414 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 104 - Forks: 32

GiftMungmeeprued/document-parsers-list
A comprehensive list of document parsers, covering PDF-to-text conversion and layout extraction. Each tested for support of tables, equations, handwriting, two-column layouts, and multi-column layouts.
Size: 4.25 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 94 - Forks: 1

virtualharsh/gujarti-image-recognition
Final year project of Diploma in Computer Major
Language: HTML - Size: 103 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0

Mohid-Water-Modelling-System/MOHID_Jupyter-Notebooks
Jupyter Notebooks for the MOHID Water Modelling System
Language: Fortran - Size: 80 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

kesslerr/m4d
How EEG preprocessing shapes decoding performance
Language: Jupyter Notebook - Size: 1010 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 14 - Forks: 0

FRoZZy228-demonit/LOGISTIC_REGRESSION_AI_MODEL
End-to-end AI model for detecting cyber threats using custom Logistic Regression in NumPy. Includes Flask backend and user-friendly interface. 🚀💻
Language: Python - Size: 211 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

dongrixinyu/JioNLP
中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com
Language: Python - Size: 159 MB - Last synced at: 1 day ago - Pushed at: 21 days ago - Stars: 3,662 - Forks: 433

AhmedNasef3/Startup-Businesses-Expansion-Dashboard
Valuable Dashboard for Startup Businesses Expansion to determine which states and cities that achieved the most profit helping us to grow the business
Language: Jupyter Notebook - Size: 512 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

EttoreRocchi/combatlearn
The ComBat algorithm for a learning framework (scikit-learn compatible)
Language: Python - Size: 1.64 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 4 - Forks: 0

gustaveroussy/prismtoolbox
Toolbox for histopathology image analysis
Language: Python - Size: 1.01 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 6 - Forks: 0

EPFL-ENAC/panel-lemanique-preprocessing
Language: R - Size: 77.1 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

data-science-lab-amsterdam/skippa
SciKIt-learn Pipeline in PAndas
Language: Python - Size: 423 KB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 42 - Forks: 1

winedarksea/AutoTS
Automated Time Series Forecasting
Language: Python - Size: 47 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 1,300 - Forks: 112

TheAlgorithms/R
Collection of various algorithms implemented in R.
Language: R - Size: 1.02 MB - Last synced at: about 23 hours ago - Pushed at: 3 months ago - Stars: 981 - Forks: 320

JvdHoogen/paderborn_bearing
Package for preprocessing Paderborn Bearing dataset
Language: Python - Size: 26.4 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 9 - Forks: 6

davidpfister/fortiche
Fortran interfaces, classes, headers and extensions.
Language: Fortran - Size: 324 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2 - Forks: 0

OpenGene/fastp
An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...)
Language: C++ - Size: 858 KB - Last synced at: 3 days ago - Pushed at: 8 days ago - Stars: 2,118 - Forks: 345

sparklapse/207f
Read what people see, and clear the unicode fog
Language: Svelte - Size: 85 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

Unstructured-IO/unstructured
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
Language: HTML - Size: 192 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 11,844 - Forks: 979

qd-cae/awesome-CAE
A curated list of awesome CAE frameworks, libraries and software.
Size: 57.6 KB - Last synced at: 6 days ago - Pushed at: 11 months ago - Stars: 424 - Forks: 108

yassineahmed/preq
preq is the community-driven problem detector for Common Reliability Enumerations (CREs).
Language: Go - Size: 79.1 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

AlwaysDhruv/Image-Classification-CPP
Hi their my self Dhruv. So this repository or project are developed on C++ and Python for image recognize. C++ are main engine and python are work preprocessing only. more information are in README file.
Language: C++ - Size: 908 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

Neko-Box-Coder/MacroPowerToys
A collection of useful C/C++ macros for manipulating arguments and preprocessing
Language: C - Size: 69.3 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 2 - Forks: 0

luxiant/sentence_segmentation
A rule-based sentence_segmenter, inspired by ruby pragmatic segmenter by diasks2 (repo: https://github.com/diasks2/pragmatic_segmenter)
Language: Rust - Size: 35.6 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 1

SomyaAgar/SMS_Spam_Detection
Language: Jupyter Notebook - Size: 823 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

danpacho/obsidian_blog
🔨 Plugin based post preprocessing & CI/CD tool for obsidian
Language: TypeScript - Size: 95.8 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

matieber/video_edge_RT
Applicación Flutter para Android para captura y preprocesamiento de video
Size: 13.7 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

hashibk/KaggleMallCustomersDataset
Applied K-Means clustering on Mall Customers dataset with PCA for dimensionality reduction. Cluster labels were added to the dataset and used to train Random Forest and LightGBM classifiers to predict customer segments on new data.
Language: Jupyter Notebook - Size: 351 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

fkie-cad/Logprep
log data pre processing, generation and shipping in python
Language: Python - Size: 9.54 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 32 - Forks: 8

Ma7moudMo7ammed/NLP-Bootcamp-with-python
Master NLP with Python in this comprehensive bootcamp. Learn text preprocessing, tokenization, and advanced techniques like Word2Vec. 🚀🐍
Language: Jupyter Notebook - Size: 11.1 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

OpenTabular/PreTab
pretab is a flexible and extensible preprocessing library for tabular data, built on top of scikit-learn. It provides advanced transformations, spline and neural feature expansions, and seamless integration with embeddings – all designed for modern tabular ML workflows.
Language: Python - Size: 113 KB - Last synced at: 1 day ago - Pushed at: 11 days ago - Stars: 5 - Forks: 1

OpenTabular/DeepTabular
Mambular is a Python package that simplifies tabular deep learning by providing a suite of models for regression, classification, and distributional regression tasks. It includes models such as Mambular, TabM, FT-Transformer, TabulaRNN, TabTransformer, and tabular ResNets.
Language: Python - Size: 8.91 MB - Last synced at: 8 days ago - Pushed at: 11 days ago - Stars: 238 - Forks: 15

jboiie/BreastCancerPrediction
A breast cancer prediction and classification tool built using PyTorch and Streamlit. It analyzes biopsy data to predict if a tumor is benign or malignant. Users can enter feature values manually or upload a CSV report for instant results.
Language: Jupyter Notebook - Size: 66.4 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

mlr-org/mlr3pipelines
Dataflow Programming for Machine Learning in R
Language: R - Size: 22.8 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 144 - Forks: 28

subu53/Ames-Housing-Price-Prediction-Ml-Regression
House price prediction using regression machine learning models
Language: Jupyter Notebook - Size: 1.16 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

dongyx/shsub
Fast Template Engine for Shell
Language: C - Size: 88.9 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 28 - Forks: 2

mhaugestad/chisel
A library to help with common NLP pre-processing tasks.
Language: Python - Size: 29 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 1

MinishLab/semhash
Fast Semantic Text Deduplication & Filtering
Language: Python - Size: 6.14 MB - Last synced at: 11 days ago - Pushed at: about 2 months ago - Stars: 748 - Forks: 43

shawntz/eyeris
eyeris: Flexible, Extensible, & Reproducible Pupillometry Preprocessing (CRAN R Package)
Language: R - Size: 100 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 5 - Forks: 3

LazarosPan/Natural-Language-Processing-and-Text-Mining
Quora Dataset - Determine if two questions ask the same thing or not
Language: Jupyter Notebook - Size: 8.79 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

Saeed-dev2/news-intelligence-predictor
Predict trends, events, or sentiments using machine learning and NLP on news headlines and articles. This project extracts insights from textual data to support real-time forecasting in finance, politics, and public opinion.
Language: Python - Size: 316 KB - Last synced at: 8 days ago - Pushed at: 14 days ago - Stars: 1 - Forks: 0

songyz2019/hsi-preprocessing-toolkit
A Hyperspectral Preprocessing Toolkit from HSI Camera to Machine Learning dataset
Language: Python - Size: 390 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 1 - Forks: 0

eriksszva/preprocessing-of-resume
Scripts and pipelines for cleaning, labeling, and preparing resume data for machine learning tasks.
Language: Jupyter Notebook - Size: 6.47 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

iam-salma/NLP-Bootcamp-with-python
A hands-on NLP Bootcamp using Python 🐍 covering text preprocessing, tokenization, stemming, lemmatization, POS tagging, NER, BoW, TF-IDF, Word2Vec, and sentiment analysis. Includes real-world projects, capstone notebooks, and ML-ready code for text classification and natural language tasks — ideal for data science, machine learning & AI learners
Language: Jupyter Notebook - Size: 9.79 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 1 - Forks: 0

gehad-Ahmed30/Natural-Language-Processing
Language: Jupyter Notebook - Size: 44.9 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

pytorch/torcharrow 📦
High performance model preprocessing library on PyTorch
Language: Python - Size: 11.3 MB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 649 - Forks: 81

geometric-intelligence/polpo
A Geometric Intelligence Lab's collection of weakly-related tools.
Language: Python - Size: 75.6 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 1

MLD3/FIDDLE
FlexIble Data-Driven pipeLinE – a preprocessing pipeline that transforms structured EHR data into feature vectors to be used with ML algorithms. https://doi.org/10.1093/jamia/ocaa139
Language: Jupyter Notebook - Size: 6.41 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 94 - Forks: 19

kartav005/news-intelligence-predictor
Classify news genres with the News Intelligence Predictor. This FastAPI app uses NLP and ML to analyze headlines and content in real-time. 📰🚀
Language: Python - Size: 293 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

Abdelrahman-Atef-Elsayed/NLP_Preprocessing_pipeline
This repo includes a generalized preprocessing pipeline for text data in NLP tasks.
Language: Jupyter Notebook - Size: 57.6 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 1 - Forks: 0

madyankin/postcss-each 📦
PostCSS plugin to iterate through values
Language: JavaScript - Size: 581 KB - Last synced at: 15 days ago - Pushed at: about 4 years ago - Stars: 94 - Forks: 20

MidoHossam14/MachineLearningAlgorithms
Hands on machine learning algorithms with scikit-learn , tensorflow and keras.
Language: Jupyter Notebook - Size: 1.99 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

daniellwdb/roka
🤖 Rise of Kingdoms bot to manage kingdom titles and DKP through Discord.
Language: TypeScript - Size: 35.6 MB - Last synced at: 10 days ago - Pushed at: 22 days ago - Stars: 34 - Forks: 18

chrislemke/sk-transformers
A collection of pandas & scikit-learn compatible transformers for preprocessing and feature engineering 🛠
Language: Python - Size: 2.55 MB - Last synced at: 6 days ago - Pushed at: 20 days ago - Stars: 11 - Forks: 0

Abdelrhman941/3-ml-preprocessing-guide
Language: Jupyter Notebook - Size: 1.55 MB - Last synced at: 21 days ago - Pushed at: 23 days ago - Stars: 4 - Forks: 1

Kumpatlapavankumar/Medical-Insurance-Cost-Estimation-Using-Machine-Learning
Using python,numpy,pandas,seaborn,matplotlib and machine learning techniques
Language: Jupyter Notebook - Size: 1.03 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 0 - Forks: 0

liyachittilappilly/FakeorRealPrediction
Machine Learning model that detects fake news using NLP and Linear SVM. Built with Python, Scikit-learn, and TF-IDF on a real-world news dataset. Achieves 93%+ accuracy.
Language: Jupyter Notebook - Size: 255 KB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 0 - Forks: 0

KinWaiCheuk/nnAudio
Audio processing by using pytorch 1D convolution network
Language: Python - Size: 94.7 MB - Last synced at: 19 days ago - Pushed at: about 2 months ago - Stars: 1,071 - Forks: 93

eds-book/ea34568e-d86e-4720-be2f-3f826f66a26c
Describing a pipeline to preprocess NOAA gridded rainfall reanalysis dataset
Language: Jupyter Notebook - Size: 1.39 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 0 - Forks: 2

Hyedryn/elikopy
ElikoPy is Python library aiming at easing the processing of diffusion imaging for microstructural analysis.
Language: Python - Size: 4.18 MB - Last synced at: 12 days ago - Pushed at: 2 months ago - Stars: 17 - Forks: 5

sappelhoff/pyprep
PyPREP: A Python implementation of the Preprocessing Pipeline (PREP) for EEG data
Language: Python - Size: 25.9 MB - Last synced at: 6 days ago - Pushed at: 2 months ago - Stars: 154 - Forks: 35

Nidal-Shahin/Job-Market-Cheat-Codes
A machine learning pipeline for analyzing job listings — built entirely on synthetic data, fine-tuned with GPT — to predict salaries, classify job roles, and cluster careers like a data wizard
Language: Jupyter Notebook - Size: 8.49 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 0 - Forks: 0

CEA-LIST/RPCDataloader
A variant of the PyTorch Dataloader using remote workers.
Language: Python - Size: 3.26 MB - Last synced at: 23 days ago - Pushed at: over 2 years ago - Stars: 19 - Forks: 1

kdsuthar/AI-based-QA-Testing_FinBank_Customer_Realibility
This project is based on Dataset cleaning and prepare final data to use for Machine Learning Quality Testing.
Language: Jupyter Notebook - Size: 450 KB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 0 - Forks: 0

SUGHA22/Data_analysis
Actively upskilling in data science with hands-on learning during a Green Internship focused on environmental sustainability. Used Pandas and NumPy for data preprocessing and cleaning, and created visual dashboards in Excel and Tableau. Gained experience in interpreting sustainability metrics and communicating insights through data storytelling and
Language: Jupyter Notebook - Size: 1.07 MB - Last synced at: 22 days ago - Pushed at: 26 days ago - Stars: 0 - Forks: 0

nipreps/nifreeze
A flexible framework for volume-wise artifact estimation and correction across multiple 4D neuroimaging modalities (diffusion MRI, functional MRI, and PET)
Language: Python - Size: 115 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 4 - Forks: 4

nuhmanpk/cv2filters
CV2Filters a powerful Python package designed as a wrapper around OpenCV,cv2Filters simplifies image processing tasks by providing a higher-level abstraction of the underlying OpenCV functionality
Language: Python - Size: 68.7 MB - Last synced at: 16 days ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 0

piotrlaczkowski/keras-data-processor
Data Preprocessing model based on Keras preprocessing layers that can be used as a standalone model or incorporated to Keras model as first layers.
Language: Python - Size: 9.31 MB - Last synced at: 14 days ago - Pushed at: about 1 month ago - Stars: 7 - Forks: 5

threadexio/cbundl
webpack but for C code.
Language: Rust - Size: 195 KB - Last synced at: 11 days ago - Pushed at: 29 days ago - Stars: 2 - Forks: 1

Adity-star/Complete-DataScience-Guide
Comprehensive repository for data science projects, tools, workflows, and resources across ML, DL, and NLP, it also contain intervew question ,ds books and some of the codes i have written over my journey
Language: Jupyter Notebook - Size: 350 MB - Last synced at: 23 days ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

DataCanvasIO/HyperGBM
A full pipeline AutoML tool for tabular data
Language: Python - Size: 11 MB - Last synced at: 19 days ago - Pushed at: 3 months ago - Stars: 351 - Forks: 47

DarkStarStrix/DataVolt
Reusable data engineering toolkit My personal data infrastructure
Language: Jupyter Notebook - Size: 56.6 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 17 - Forks: 2

moosmann/matlab
Data reconstruction and analysis tools for tomography data acquired at the P05 Imaging Beamline (IBL) and the P07 High-Energy Material Science (HEMS) beamline at PETRA III at DESY, both operated by Helmholtz-Zentrum Hereon.
Language: MATLAB - Size: 21.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 9 - Forks: 7

acroucher/PyTOUGH
A Python library for automating TOUGH2 simulations of subsurface fluid and heat flow
Language: Python - Size: 41.7 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 101 - Forks: 38

FaNa-AI/Data-exploration-and-preprocessing
A Python script to clean and preprocess house price data from Excel, removing invalid and missing values for better analysis.
Size: 258 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Infinitode/DupliPy
DupliPy is a quick and easy-to-use package that can handle text formatting and data augmentation tasks for NLP in Python. It now offers support for image augmentation tasks as well.
Language: Python - Size: 65.4 KB - Last synced at: 16 days ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

preprocessy/preprocessy
Python package for Customizable Data Preprocessing Pipelines
Language: Jupyter Notebook - Size: 993 KB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 43 - Forks: 14

NirLab-TAU/sleepeegpy
Language: Jupyter Notebook - Size: 166 MB - Last synced at: 5 days ago - Pushed at: 2 months ago - Stars: 31 - Forks: 10

Shakiba-Alipour/Data-Mining-Project
Data mining on university of twente website
Language: Python - Size: 48.8 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

keurfonluu/toughio
Pre- and post-processing Python library for TOUGH
Language: Python - Size: 18.4 MB - Last synced at: 5 days ago - Pushed at: 11 days ago - Stars: 61 - Forks: 8

hms-immunology/RNA_QC_APP
RNA-seq Quality Control and Preprocessing Tool - Interactive Shiny application for comprehensive RNA sequencing data validation, quality assessment, filtering, and normalization. Developed at Harvard Medical School Department of Immunology for global research use. Topics
Language: R - Size: 11 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

lucasrla/wsi-preprocessing-sos-workflow
A pipeline to preprocess whole-slide images (WSI) towards deep learning
Size: 14.6 KB - Last synced at: 7 days ago - Pushed at: almost 5 years ago - Stars: 5 - Forks: 2

jbusecke/xMIP
Analysis ready CMIP6 data in python the easy way with pangeo tools.
Language: Jupyter Notebook - Size: 20.4 MB - Last synced at: 1 day ago - Pushed at: 6 days ago - Stars: 201 - Forks: 44

NYXMatik/ProjectAPPROF
Made for academic purposes, repository for Deep Learning Project - CNN and RNN usage for image classification and value predicting
Language: Jupyter Notebook - Size: 12.4 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 1

erenada/RNA_QC_APP
RNA-seq Quality Control and Preprocessing Tool - Interactive Shiny application for comprehensive RNA sequencing data validation, quality assessment, filtering, and normalization. Developed at Harvard Medical School Department of Immunology for global research use.
Language: R - Size: 11 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

DragomirBozoki/wiki-feature-selection-pyspark
NLP project for large-scale entropy and mutual information analysis on Wikipedia. Used PySpark to identify top n-gram features and train logistic classifiers.
Language: Python - Size: 316 KB - Last synced at: 25 days ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

Anu-Prabha-Joseph/Turk-NLP
TurkNLP is a comprehensive library designed for natural language processing in Turkish, offering modular and extensible features. It supports various tasks like tokenization, morphological analysis, and sentiment analysis, making it a valuable tool for both academic and industrial applications. 🐙✨
Language: Python - Size: 18.6 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

akankshaj2712/Vehicle-Insurance-Fraud-Detection
Fraud Detection using Machine Learning
Language: Jupyter Notebook - Size: 4.44 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

anishdeshmukh9/AI-model-Training-Disease-prognosis
this was a academic project that showcase my pre&post ML model knowledge such as, data collection, data preprocessing, AI model training( ML) and finetune the model
Language: Python - Size: 8.13 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

ALebrun-108/BoxSERS
Python package that provides a full range of functionality to process and analyze vibrational spectra (Raman, SERS, FTIR, etc.).
Language: Jupyter Notebook - Size: 20 MB - Last synced at: 27 days ago - Pushed at: 10 months ago - Stars: 65 - Forks: 15

hscspring/pnlp
NLP预/后处理工具。
Language: Python - Size: 106 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 30 - Forks: 6

lucasrla/wsi-preprocessing
Simple library for preprocessing histopathological whole-slide images (WSI) into tiles (a.k.a. patches) towards deep learning
Language: Python - Size: 18.6 KB - Last synced at: 7 days ago - Pushed at: almost 2 years ago - Stars: 55 - Forks: 14

Awais-Asghar/Skin-Cancer-Binary-Classifier
A machine learning project for binary classification of skin cancer as malignant or benign, utilizing models like XGBoost, LGBM Classifier, Adaboost, SVM, and Logistic Regression. Features comprehensive data preprocessing, model training, and evaluation for accurate diagnosis.
Language: Jupyter Notebook - Size: 5.65 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

ankur150/ML-Projects
I applied K-Means clustering on Facebook Live Sellers data to group posts based on engagement metrics. Preprocessing techniques included Label Encoding (for status_type) and Standardization (using StandardScaler for numerical features). Using the Elbow Method, we determined k=3, trained the model, and visualized clusters and centroids.
Language: Jupyter Notebook - Size: 2.33 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

jknafou/TransCorpus
TransCorpus is a scalable toolkit for large-scale, parallel translation and preprocessing of text corpora, built for language model pretraining and research.
Language: Python - Size: 5.91 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
