An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: preprocessing

PylarBear/pybear

pybear is a Python computing library that augments data analytics functionality found in popular packages that use the scikit-learn API, such as scikit-learn and xgboost.

Language: Python - Size: 49.9 MB - Last synced at: about 20 hours ago - Pushed at: about 20 hours ago - Stars: 0 - Forks: 0

ShrutiSemwal/MTech.-Dissertation-UNet-with-Attention-Mechanism

UNet with Attention-DL Model for green building domain

Size: 90.8 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

xga0/contraction_fix

A fast and efficient library for fixing contractions in text

Language: Python - Size: 33.2 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 3 - Forks: 1

xga0/emoticon_fix

A lightweight and efficient library for transforming emoticons into their semantic meanings

Language: Python - Size: 71.3 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 4 - Forks: 1

Saeed-dev2/poultry_Form_Disease_Deep-Learning-Machine-Learning-Computer-vision

An AI-powered poultry disease detection system that uses deep learning to classify chicken feces images. Built with TensorFlow and Streamlit, it provides quick, accessible diagnosis for farmers. Trained on PCR-validated data for accurate classification of diseases like Salmonella and Coccidiosis.

Language: Jupyter Notebook - Size: 13.7 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0

methlabUZH/automagic

Automagic

Language: MATLAB - Size: 414 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 104 - Forks: 32

GiftMungmeeprued/document-parsers-list

A comprehensive list of document parsers, covering PDF-to-text conversion and layout extraction. Each tested for support of tables, equations, handwriting, two-column layouts, and multi-column layouts.

Size: 4.25 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 94 - Forks: 1

virtualharsh/gujarti-image-recognition

Final year project of Diploma in Computer Major

Language: HTML - Size: 103 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0

Mohid-Water-Modelling-System/MOHID_Jupyter-Notebooks

Jupyter Notebooks for the MOHID Water Modelling System

Language: Fortran - Size: 80 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

kesslerr/m4d

How EEG preprocessing shapes decoding performance

Language: Jupyter Notebook - Size: 1010 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 14 - Forks: 0

FRoZZy228-demonit/LOGISTIC_REGRESSION_AI_MODEL

End-to-end AI model for detecting cyber threats using custom Logistic Regression in NumPy. Includes Flask backend and user-friendly interface. 🚀💻

Language: Python - Size: 211 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

dongrixinyu/JioNLP

中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com

Language: Python - Size: 159 MB - Last synced at: 1 day ago - Pushed at: 21 days ago - Stars: 3,662 - Forks: 433

AhmedNasef3/Startup-Businesses-Expansion-Dashboard

Valuable Dashboard for Startup Businesses Expansion to determine which states and cities that achieved the most profit helping us to grow the business

Language: Jupyter Notebook - Size: 512 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

EttoreRocchi/combatlearn

The ComBat algorithm for a learning framework (scikit-learn compatible)

Language: Python - Size: 1.64 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 4 - Forks: 0

gustaveroussy/prismtoolbox

Toolbox for histopathology image analysis

Language: Python - Size: 1.01 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 6 - Forks: 0

EPFL-ENAC/panel-lemanique-preprocessing

Language: R - Size: 77.1 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

data-science-lab-amsterdam/skippa

SciKIt-learn Pipeline in PAndas

Language: Python - Size: 423 KB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 42 - Forks: 1

winedarksea/AutoTS

Automated Time Series Forecasting

Language: Python - Size: 47 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 1,300 - Forks: 112

TheAlgorithms/R

Collection of various algorithms implemented in R.

Language: R - Size: 1.02 MB - Last synced at: about 23 hours ago - Pushed at: 3 months ago - Stars: 981 - Forks: 320

JvdHoogen/paderborn_bearing

Package for preprocessing Paderborn Bearing dataset

Language: Python - Size: 26.4 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 9 - Forks: 6

davidpfister/fortiche

Fortran interfaces, classes, headers and extensions.

Language: Fortran - Size: 324 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2 - Forks: 0

OpenGene/fastp

An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...)

Language: C++ - Size: 858 KB - Last synced at: 3 days ago - Pushed at: 8 days ago - Stars: 2,118 - Forks: 345

sparklapse/207f

Read what people see, and clear the unicode fog

Language: Svelte - Size: 85 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

Unstructured-IO/unstructured

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

Language: HTML - Size: 192 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 11,844 - Forks: 979

qd-cae/awesome-CAE

A curated list of awesome CAE frameworks, libraries and software.

Size: 57.6 KB - Last synced at: 6 days ago - Pushed at: 11 months ago - Stars: 424 - Forks: 108

yassineahmed/preq

preq is the community-driven problem detector for Common Reliability Enumerations (CREs).

Language: Go - Size: 79.1 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

AlwaysDhruv/Image-Classification-CPP

Hi their my self Dhruv. So this repository or project are developed on C++ and Python for image recognize. C++ are main engine and python are work preprocessing only. more information are in README file.

Language: C++ - Size: 908 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

Neko-Box-Coder/MacroPowerToys

A collection of useful C/C++ macros for manipulating arguments and preprocessing

Language: C - Size: 69.3 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 2 - Forks: 0

luxiant/sentence_segmentation

A rule-based sentence_segmenter, inspired by ruby pragmatic segmenter by diasks2 (repo: https://github.com/diasks2/pragmatic_segmenter)

Language: Rust - Size: 35.6 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 1

SomyaAgar/SMS_Spam_Detection

Language: Jupyter Notebook - Size: 823 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

danpacho/obsidian_blog

🔨 Plugin based post preprocessing & CI/CD tool for obsidian

Language: TypeScript - Size: 95.8 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

matieber/video_edge_RT

Applicación Flutter para Android para captura y preprocesamiento de video

Size: 13.7 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

hashibk/KaggleMallCustomersDataset

Applied K-Means clustering on Mall Customers dataset with PCA for dimensionality reduction. Cluster labels were added to the dataset and used to train Random Forest and LightGBM classifiers to predict customer segments on new data.

Language: Jupyter Notebook - Size: 351 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

fkie-cad/Logprep

log data pre processing, generation and shipping in python

Language: Python - Size: 9.54 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 32 - Forks: 8

Ma7moudMo7ammed/NLP-Bootcamp-with-python

Master NLP with Python in this comprehensive bootcamp. Learn text preprocessing, tokenization, and advanced techniques like Word2Vec. 🚀🐍

Language: Jupyter Notebook - Size: 11.1 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

OpenTabular/PreTab

pretab is a flexible and extensible preprocessing library for tabular data, built on top of scikit-learn. It provides advanced transformations, spline and neural feature expansions, and seamless integration with embeddings – all designed for modern tabular ML workflows.

Language: Python - Size: 113 KB - Last synced at: 1 day ago - Pushed at: 11 days ago - Stars: 5 - Forks: 1

OpenTabular/DeepTabular

Mambular is a Python package that simplifies tabular deep learning by providing a suite of models for regression, classification, and distributional regression tasks. It includes models such as Mambular, TabM, FT-Transformer, TabulaRNN, TabTransformer, and tabular ResNets.

Language: Python - Size: 8.91 MB - Last synced at: 8 days ago - Pushed at: 11 days ago - Stars: 238 - Forks: 15

jboiie/BreastCancerPrediction

A breast cancer prediction and classification tool built using PyTorch and Streamlit. It analyzes biopsy data to predict if a tumor is benign or malignant. Users can enter feature values manually or upload a CSV report for instant results.

Language: Jupyter Notebook - Size: 66.4 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

mlr-org/mlr3pipelines

Dataflow Programming for Machine Learning in R

Language: R - Size: 22.8 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 144 - Forks: 28

subu53/Ames-Housing-Price-Prediction-Ml-Regression

House price prediction using regression machine learning models

Language: Jupyter Notebook - Size: 1.16 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

dongyx/shsub

Fast Template Engine for Shell

Language: C - Size: 88.9 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 28 - Forks: 2

mhaugestad/chisel

A library to help with common NLP pre-processing tasks.

Language: Python - Size: 29 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 1

MinishLab/semhash

Fast Semantic Text Deduplication & Filtering

Language: Python - Size: 6.14 MB - Last synced at: 11 days ago - Pushed at: about 2 months ago - Stars: 748 - Forks: 43

shawntz/eyeris

eyeris: Flexible, Extensible, & Reproducible Pupillometry Preprocessing (CRAN R Package)

Language: R - Size: 100 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 5 - Forks: 3

LazarosPan/Natural-Language-Processing-and-Text-Mining

Quora Dataset - Determine if two questions ask the same thing or not

Language: Jupyter Notebook - Size: 8.79 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

Saeed-dev2/news-intelligence-predictor

Predict trends, events, or sentiments using machine learning and NLP on news headlines and articles. This project extracts insights from textual data to support real-time forecasting in finance, politics, and public opinion.

Language: Python - Size: 316 KB - Last synced at: 8 days ago - Pushed at: 14 days ago - Stars: 1 - Forks: 0

songyz2019/hsi-preprocessing-toolkit

A Hyperspectral Preprocessing Toolkit from HSI Camera to Machine Learning dataset

Language: Python - Size: 390 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 1 - Forks: 0

eriksszva/preprocessing-of-resume

Scripts and pipelines for cleaning, labeling, and preparing resume data for machine learning tasks.

Language: Jupyter Notebook - Size: 6.47 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

iam-salma/NLP-Bootcamp-with-python

A hands-on NLP Bootcamp using Python 🐍 covering text preprocessing, tokenization, stemming, lemmatization, POS tagging, NER, BoW, TF-IDF, Word2Vec, and sentiment analysis. Includes real-world projects, capstone notebooks, and ML-ready code for text classification and natural language tasks — ideal for data science, machine learning & AI learners

Language: Jupyter Notebook - Size: 9.79 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 1 - Forks: 0

gehad-Ahmed30/Natural-Language-Processing

Language: Jupyter Notebook - Size: 44.9 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

pytorch/torcharrow 📦

High performance model preprocessing library on PyTorch

Language: Python - Size: 11.3 MB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 649 - Forks: 81

geometric-intelligence/polpo

A Geometric Intelligence Lab's collection of weakly-related tools.

Language: Python - Size: 75.6 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 1

MLD3/FIDDLE

FlexIble Data-Driven pipeLinE – a preprocessing pipeline that transforms structured EHR data into feature vectors to be used with ML algorithms. https://doi.org/10.1093/jamia/ocaa139

Language: Jupyter Notebook - Size: 6.41 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 94 - Forks: 19

kartav005/news-intelligence-predictor

Classify news genres with the News Intelligence Predictor. This FastAPI app uses NLP and ML to analyze headlines and content in real-time. 📰🚀

Language: Python - Size: 293 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

Abdelrahman-Atef-Elsayed/NLP_Preprocessing_pipeline

This repo includes a generalized preprocessing pipeline for text data in NLP tasks.

Language: Jupyter Notebook - Size: 57.6 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 1 - Forks: 0

madyankin/postcss-each 📦

PostCSS plugin to iterate through values

Language: JavaScript - Size: 581 KB - Last synced at: 15 days ago - Pushed at: about 4 years ago - Stars: 94 - Forks: 20

MidoHossam14/MachineLearningAlgorithms

Hands on machine learning algorithms with scikit-learn , tensorflow and keras.

Language: Jupyter Notebook - Size: 1.99 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

daniellwdb/roka

🤖 Rise of Kingdoms bot to manage kingdom titles and DKP through Discord.

Language: TypeScript - Size: 35.6 MB - Last synced at: 10 days ago - Pushed at: 22 days ago - Stars: 34 - Forks: 18

chrislemke/sk-transformers

A collection of pandas & scikit-learn compatible transformers for preprocessing and feature engineering 🛠

Language: Python - Size: 2.55 MB - Last synced at: 6 days ago - Pushed at: 20 days ago - Stars: 11 - Forks: 0

Abdelrhman941/3-ml-preprocessing-guide

Language: Jupyter Notebook - Size: 1.55 MB - Last synced at: 21 days ago - Pushed at: 23 days ago - Stars: 4 - Forks: 1

Kumpatlapavankumar/Medical-Insurance-Cost-Estimation-Using-Machine-Learning

Using python,numpy,pandas,seaborn,matplotlib and machine learning techniques

Language: Jupyter Notebook - Size: 1.03 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 0 - Forks: 0

liyachittilappilly/FakeorRealPrediction

Machine Learning model that detects fake news using NLP and Linear SVM. Built with Python, Scikit-learn, and TF-IDF on a real-world news dataset. Achieves 93%+ accuracy.

Language: Jupyter Notebook - Size: 255 KB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 0 - Forks: 0

KinWaiCheuk/nnAudio

Audio processing by using pytorch 1D convolution network

Language: Python - Size: 94.7 MB - Last synced at: 19 days ago - Pushed at: about 2 months ago - Stars: 1,071 - Forks: 93

eds-book/ea34568e-d86e-4720-be2f-3f826f66a26c

Describing a pipeline to preprocess NOAA gridded rainfall reanalysis dataset

Language: Jupyter Notebook - Size: 1.39 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 0 - Forks: 2

Hyedryn/elikopy

ElikoPy is Python library aiming at easing the processing of diffusion imaging for microstructural analysis.

Language: Python - Size: 4.18 MB - Last synced at: 12 days ago - Pushed at: 2 months ago - Stars: 17 - Forks: 5

sappelhoff/pyprep

PyPREP: A Python implementation of the Preprocessing Pipeline (PREP) for EEG data

Language: Python - Size: 25.9 MB - Last synced at: 6 days ago - Pushed at: 2 months ago - Stars: 154 - Forks: 35

Nidal-Shahin/Job-Market-Cheat-Codes

A machine learning pipeline for analyzing job listings — built entirely on synthetic data, fine-tuned with GPT — to predict salaries, classify job roles, and cluster careers like a data wizard

Language: Jupyter Notebook - Size: 8.49 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 0 - Forks: 0

CEA-LIST/RPCDataloader

A variant of the PyTorch Dataloader using remote workers.

Language: Python - Size: 3.26 MB - Last synced at: 23 days ago - Pushed at: over 2 years ago - Stars: 19 - Forks: 1

kdsuthar/AI-based-QA-Testing_FinBank_Customer_Realibility

This project is based on Dataset cleaning and prepare final data to use for Machine Learning Quality Testing.

Language: Jupyter Notebook - Size: 450 KB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 0 - Forks: 0

SUGHA22/Data_analysis

Actively upskilling in data science with hands-on learning during a Green Internship focused on environmental sustainability. Used Pandas and NumPy for data preprocessing and cleaning, and created visual dashboards in Excel and Tableau. Gained experience in interpreting sustainability metrics and communicating insights through data storytelling and

Language: Jupyter Notebook - Size: 1.07 MB - Last synced at: 22 days ago - Pushed at: 26 days ago - Stars: 0 - Forks: 0

nipreps/nifreeze

A flexible framework for volume-wise artifact estimation and correction across multiple 4D neuroimaging modalities (diffusion MRI, functional MRI, and PET)

Language: Python - Size: 115 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 4 - Forks: 4

nuhmanpk/cv2filters

CV2Filters a powerful Python package designed as a wrapper around OpenCV,cv2Filters simplifies image processing tasks by providing a higher-level abstraction of the underlying OpenCV functionality

Language: Python - Size: 68.7 MB - Last synced at: 16 days ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 0

piotrlaczkowski/keras-data-processor

Data Preprocessing model based on Keras preprocessing layers that can be used as a standalone model or incorporated to Keras model as first layers.

Language: Python - Size: 9.31 MB - Last synced at: 14 days ago - Pushed at: about 1 month ago - Stars: 7 - Forks: 5

threadexio/cbundl

webpack but for C code.

Language: Rust - Size: 195 KB - Last synced at: 11 days ago - Pushed at: 29 days ago - Stars: 2 - Forks: 1

Adity-star/Complete-DataScience-Guide

Comprehensive repository for data science projects, tools, workflows, and resources across ML, DL, and NLP, it also contain intervew question ,ds books and some of the codes i have written over my journey

Language: Jupyter Notebook - Size: 350 MB - Last synced at: 23 days ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

DataCanvasIO/HyperGBM

A full pipeline AutoML tool for tabular data

Language: Python - Size: 11 MB - Last synced at: 19 days ago - Pushed at: 3 months ago - Stars: 351 - Forks: 47

DarkStarStrix/DataVolt

Reusable data engineering toolkit My personal data infrastructure

Language: Jupyter Notebook - Size: 56.6 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 17 - Forks: 2

moosmann/matlab

Data reconstruction and analysis tools for tomography data acquired at the P05 Imaging Beamline (IBL) and the P07 High-Energy Material Science (HEMS) beamline at PETRA III at DESY, both operated by Helmholtz-Zentrum Hereon.

Language: MATLAB - Size: 21.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 9 - Forks: 7

acroucher/PyTOUGH

A Python library for automating TOUGH2 simulations of subsurface fluid and heat flow

Language: Python - Size: 41.7 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 101 - Forks: 38

FaNa-AI/Data-exploration-and-preprocessing

A Python script to clean and preprocess house price data from Excel, removing invalid and missing values for better analysis.

Size: 258 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Infinitode/DupliPy

DupliPy is a quick and easy-to-use package that can handle text formatting and data augmentation tasks for NLP in Python. It now offers support for image augmentation tasks as well.

Language: Python - Size: 65.4 KB - Last synced at: 16 days ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

preprocessy/preprocessy

Python package for Customizable Data Preprocessing Pipelines

Language: Jupyter Notebook - Size: 993 KB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 43 - Forks: 14

NirLab-TAU/sleepeegpy

Language: Jupyter Notebook - Size: 166 MB - Last synced at: 5 days ago - Pushed at: 2 months ago - Stars: 31 - Forks: 10

Shakiba-Alipour/Data-Mining-Project

Data mining on university of twente website

Language: Python - Size: 48.8 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

keurfonluu/toughio

Pre- and post-processing Python library for TOUGH

Language: Python - Size: 18.4 MB - Last synced at: 5 days ago - Pushed at: 11 days ago - Stars: 61 - Forks: 8

hms-immunology/RNA_QC_APP

RNA-seq Quality Control and Preprocessing Tool - Interactive Shiny application for comprehensive RNA sequencing data validation, quality assessment, filtering, and normalization. Developed at Harvard Medical School Department of Immunology for global research use. Topics

Language: R - Size: 11 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

lucasrla/wsi-preprocessing-sos-workflow

A pipeline to preprocess whole-slide images (WSI) towards deep learning

Size: 14.6 KB - Last synced at: 7 days ago - Pushed at: almost 5 years ago - Stars: 5 - Forks: 2

jbusecke/xMIP

Analysis ready CMIP6 data in python the easy way with pangeo tools.

Language: Jupyter Notebook - Size: 20.4 MB - Last synced at: 1 day ago - Pushed at: 6 days ago - Stars: 201 - Forks: 44

NYXMatik/ProjectAPPROF

Made for academic purposes, repository for Deep Learning Project - CNN and RNN usage for image classification and value predicting

Language: Jupyter Notebook - Size: 12.4 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 1

erenada/RNA_QC_APP

RNA-seq Quality Control and Preprocessing Tool - Interactive Shiny application for comprehensive RNA sequencing data validation, quality assessment, filtering, and normalization. Developed at Harvard Medical School Department of Immunology for global research use.

Language: R - Size: 11 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

DragomirBozoki/wiki-feature-selection-pyspark

NLP project for large-scale entropy and mutual information analysis on Wikipedia. Used PySpark to identify top n-gram features and train logistic classifiers.

Language: Python - Size: 316 KB - Last synced at: 25 days ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

Anu-Prabha-Joseph/Turk-NLP

TurkNLP is a comprehensive library designed for natural language processing in Turkish, offering modular and extensible features. It supports various tasks like tokenization, morphological analysis, and sentiment analysis, making it a valuable tool for both academic and industrial applications. 🐙✨

Language: Python - Size: 18.6 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

akankshaj2712/Vehicle-Insurance-Fraud-Detection

Fraud Detection using Machine Learning

Language: Jupyter Notebook - Size: 4.44 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

anishdeshmukh9/AI-model-Training-Disease-prognosis

this was a academic project that showcase my pre&post ML model knowledge such as, data collection, data preprocessing, AI model training( ML) and finetune the model

Language: Python - Size: 8.13 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

ALebrun-108/BoxSERS

Python package that provides a full range of functionality to process and analyze vibrational spectra (Raman, SERS, FTIR, etc.).

Language: Jupyter Notebook - Size: 20 MB - Last synced at: 27 days ago - Pushed at: 10 months ago - Stars: 65 - Forks: 15

hscspring/pnlp

NLP预/后处理工具。

Language: Python - Size: 106 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 30 - Forks: 6

lucasrla/wsi-preprocessing

Simple library for preprocessing histopathological whole-slide images (WSI) into tiles (a.k.a. patches) towards deep learning

Language: Python - Size: 18.6 KB - Last synced at: 7 days ago - Pushed at: almost 2 years ago - Stars: 55 - Forks: 14

Awais-Asghar/Skin-Cancer-Binary-Classifier

A machine learning project for binary classification of skin cancer as malignant or benign, utilizing models like XGBoost, LGBM Classifier, Adaboost, SVM, and Logistic Regression. Features comprehensive data preprocessing, model training, and evaluation for accurate diagnosis.

Language: Jupyter Notebook - Size: 5.65 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

ankur150/ML-Projects

I applied K-Means clustering on Facebook Live Sellers data to group posts based on engagement metrics. Preprocessing techniques included Label Encoding (for status_type) and Standardization (using StandardScaler for numerical features). Using the Elbow Method, we determined k=3, trained the model, and visualized clusters and centroids.

Language: Jupyter Notebook - Size: 2.33 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

jknafou/TransCorpus

TransCorpus is a scalable toolkit for large-scale, parallel translation and preprocessing of text corpora, built for language model pretraining and research.

Language: Python - Size: 5.91 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Related Keywords
preprocessing 1,451 machine-learning 432 python 401 data-science 171 nlp 157 pandas 124 deep-learning 107 classification 105 numpy 75 data 75 data-visualization 74 data-analysis 71 python3 68 sklearn 64 eda 62 natural-language-processing 58 logistic-regression 57 tensorflow 56 dataset 56 feature-engineering 56 linear-regression 56 visualization 56 exploratory-data-analysis 54 random-forest 52 machine-learning-algorithms 51 matplotlib 51 scikit-learn 50 data-cleaning 49 clustering 46 data-mining 45 regression 44 jupyter-notebook 43 seaborn 41 sentiment-analysis 39 keras 38 image-processing 37 neural-network 36 pytorch 35 nltk 34 pipeline 34 r 32 feature-extraction 31 svm 31 analysis 30 ml 28 neural-networks 28 computer-vision 28 cnn 27 preprocessor 26 supervised-learning 25 artificial-intelligence 25 decision-trees 25 svm-classifier 24 datascience 24 xgboost 23 prediction 22 ai 22 nlp-machine-learning 22 streamlit 21 time-series 21 feature-selection 21 predictive-modeling 21 pca 21 tf-idf 20 normalization 20 kaggle 20 text-classification 19 naive-bayes-classifier 19 statistics 19 eeg 19 text-processing 18 knn-classification 18 preprocessing-data 18 knn 18 text 17 pca-analysis 17 word2vec 17 regression-models 16 datacleaning 16 confusion-matrix 16 opencv 16 lemmatization 16 tokenization 16 java 16 kmeans-clustering 16 postprocessing 15 data-preprocessing 15 tokenizer 15 css 15 text-mining 15 neuroimaging 15 outlier-detection 15 mri 15 html 14 dimensionality-reduction 14 matplotlib-pyplot 14 random-forest-classifier 14 cross-validation 13 hyperparameter-tuning 13 modeling 13