An open API service providing repository metadata for many open source software ecosystems.

Topic: "preprocessing"

Unstructured-IO/unstructured

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

Language: HTML - Size: 194 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 13,204 - Forks: 1,082

dongrixinyu/JioNLP

中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com

Language: Python - Size: 159 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 3,746 - Forks: 441

nidhaloff/igel

a delightful machine learning tool that allows you to train, test, and use models without writing code

Language: Python - Size: 18.8 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 3,127 - Forks: 193

OpenGene/fastp

An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...)

Language: C++ - Size: 691 KB - Last synced at: 24 days ago - Pushed at: about 1 month ago - Stars: 2,204 - Forks: 354

AxeldeRomblay/MLBox

MLBox is a powerful Automated Machine Learning python library.

Language: Python - Size: 50 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 1,518 - Forks: 273

winedarksea/AutoTS

Automated Time Series Forecasting

Language: Python - Size: 48.6 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,331 - Forks: 117

sunlabuiuc/PyHealth

A Deep Learning Python Toolkit for Healthcare Applications.

Language: Python - Size: 122 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1,304 - Forks: 457

NVIDIA-Merlin/NVTabular

NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

Language: Python - Size: 98.4 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1,100 - Forks: 144

KinWaiCheuk/nnAudio

Audio processing by using pytorch 1D convolution network

Language: Python - Size: 94.7 MB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 1,085 - Forks: 96

TheAlgorithms/R

Collection of various algorithms implemented in R.

Language: R - Size: 1.37 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 1,036 - Forks: 342

MinishLab/semhash

Fast Semantic Text Deduplication & Filtering

Language: Python - Size: 6.18 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 825 - Forks: 51

pytorch/torcharrow 📦

High performance model preprocessing library on PyTorch

Language: Python - Size: 11.3 MB - Last synced at: 20 days ago - Pushed at: over 1 year ago - Stars: 644 - Forks: 81

qd-cae/awesome-CAE

A curated list of awesome CAE frameworks, libraries and software.

Size: 57.6 KB - Last synced at: 24 days ago - Pushed at: over 1 year ago - Stars: 443 - Forks: 109

R1j1t/contextualSpellCheck

✔️Contextual word checker for better suggestions (not actively maintained)

Language: Python - Size: 2.45 MB - Last synced at: 2 months ago - Pushed at: 10 months ago - Stars: 417 - Forks: 64

msamogh/nonechucks

Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!

Language: Python - Size: 25.4 KB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 377 - Forks: 27

MaxHalford/xam

:dart: Personal data science and machine learning toolbox

Language: Python - Size: 1.12 MB - Last synced at: 3 months ago - Pushed at: almost 6 years ago - Stars: 365 - Forks: 75

DataCanvasIO/HyperGBM

A full pipeline AutoML tool for tabular data

Language: Python - Size: 11 MB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 355 - Forks: 47

ikegami-yukino/jaconv

Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku

Language: Python - Size: 369 KB - Last synced at: 30 days ago - Pushed at: 4 months ago - Stars: 335 - Forks: 32

advaitsave/Introduction-to-Time-Series-forecasting-Python

Introduction to time series preprocessing and forecasting in Python using AR, MA, ARMA, ARIMA, SARIMA and Prophet model with forecast evaluation.

Language: Jupyter Notebook - Size: 2.02 MB - Last synced at: 8 months ago - Pushed at: almost 7 years ago - Stars: 323 - Forks: 138

cylondata/cylon

Cylon is a fast, scalable, distributed memory, parallel runtime with a Pandas like DataFrame.

Language: C++ - Size: 10.7 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 293 - Forks: 44

nlpcl-lab/ace2005-preprocessing

ACE 2005 corpus preprocessing for Event Extraction task

Language: Python - Size: 45.9 KB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 280 - Forks: 71

ikegami-yukino/neologdn

Japanese text normalizer for mecab-neologd

Language: Cython - Size: 593 KB - Last synced at: 8 months ago - Pushed at: 10 months ago - Stars: 278 - Forks: 20

OpenTabular/DeepTab

DeepTabular is a Python package that simplifies tabular deep learning by providing a suite of models for regression, classification, and distributional regression tasks. It includes models such as Mambular, TabM, FT-Transformer, TabulaRNN, TabTransformer, and tabular ResNets.

Language: Python - Size: 8.98 MB - Last synced at: about 1 hour ago - Pushed at: about 3 hours ago - Stars: 275 - Forks: 17

dunky11/voicesmith

[WIP] VoiceSmith makes training text to speech models easy.

Language: Python - Size: 57 MB - Last synced at: 7 months ago - Pushed at: about 3 years ago - Stars: 224 - Forks: 32

Deffro/text-preprocessing-techniques

16 Text Preprocessing Techniques in Python for Twitter Sentiment Analysis.

Language: Python - Size: 2.36 MB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 217 - Forks: 82

jbusecke/xMIP

Analysis ready CMIP6 data in python the easy way with pangeo tools.

Language: Jupyter Notebook - Size: 20.4 MB - Last synced at: 21 days ago - Pushed at: about 1 month ago - Stars: 204 - Forks: 44

free-astro/siril

The Siril image processing software for amateur astronomy

Last synced at: 9 days ago - Stars: 184 - Forks: 108

google/tensorflow-recorder 📦

TFRecorder makes it easy to create TensorFlow records (TFRecords) from Pandas DataFrames and CSVs files containing images or structured data.

Language: Python - Size: 6.54 MB - Last synced at: 1 day ago - Pushed at: over 3 years ago - Stars: 180 - Forks: 32

quqixun/BrainPrep 📦

Preprocessing pipeline on Brain MR Images through FSL and ANTs, including registration, skull-stripping, bias field correction, enhancement and segmentation.

Language: Python - Size: 43.7 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 172 - Forks: 51

sappelhoff/pyprep

PyPREP: A Python implementation of the Preprocessing Pipeline (PREP) for EEG data

Language: Python - Size: 26 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 163 - Forks: 35

ropensci/MODIStsp

An "R" package for automatic download and preprocessing of MODIS Land Products Time Series

Language: R - Size: 180 MB - Last synced at: 25 days ago - Pushed at: 6 months ago - Stars: 159 - Forks: 53

Razor12911/xtool 📦

Just some tool repackers like to use...

Language: Pascal - Size: 22.6 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 152 - Forks: 11

githubharald/DeslantImg

The deslanting algorithm sets text upright in images. Python, C++ and OpenCL implementations provided.

Language: C++ - Size: 591 KB - Last synced at: 6 months ago - Pushed at: almost 4 years ago - Stars: 150 - Forks: 38

mlr-org/mlr3pipelines

Dataflow Programming for Machine Learning in R

Language: R - Size: 25.8 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 147 - Forks: 28

autoreject/autoreject

Automated rejection and repair of bad trials/sensors in M/EEG

Language: Python - Size: 704 KB - Last synced at: 9 days ago - Pushed at: 2 months ago - Stars: 147 - Forks: 59

jaeho3690/LIDC-IDRI-Preprocessing

This is the preprocessing step of the LIDC-IDRI dataset

Language: Jupyter Notebook - Size: 1.84 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 140 - Forks: 39

chakki-works/chariot

Deliver the ready-to-train data to your NLP model.

Language: Jupyter Notebook - Size: 5.61 MB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 122 - Forks: 9

KananVyas/BoxDetection

A Box detection algorithm for any image containing boxes.

Language: Jupyter Notebook - Size: 411 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 118 - Forks: 53

lozuwa/impy

Impy is a Python3 library with features that help you in your computer vision tasks.

Language: Python - Size: 91.4 MB - Last synced at: 8 months ago - Pushed at: over 6 years ago - Stars: 116 - Forks: 32

chrise96/3D_Ground_Segmentation

A ground segmentation algorithm for 3D point clouds based on the work described in “Fast segmentation of 3D point clouds: a paradigm on LIDAR data for Autonomous Vehicle Applications”, D. Zermas, I. Izzat and N. Papanikolopoulos, 2017. Distinguish between road and non-road points. Road surface extraction. Plane fit ground filter

Language: C++ - Size: 2.91 MB - Last synced at: 8 months ago - Pushed at: almost 4 years ago - Stars: 108 - Forks: 14

methlabUZH/automagic

Automagic

Language: MATLAB - Size: 414 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 104 - Forks: 32

acroucher/PyTOUGH

A Python library for automating TOUGH2 simulations of subsurface fluid and heat flow

Language: Python - Size: 40.4 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 102 - Forks: 38

GiftMungmeeprued/document-parsers-list

A comprehensive list of document parsers, covering PDF-to-text conversion and layout extraction. Each tested for support of tables, equations, handwriting, two-column layouts, and multi-column layouts.

Size: 4.25 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 94 - Forks: 1

MLD3/FIDDLE

FlexIble Data-Driven pipeLinE – a preprocessing pipeline that transforms structured EHR data into feature vectors to be used with ML algorithms. https://doi.org/10.1093/jamia/ocaa139

Language: Jupyter Notebook - Size: 6.41 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 94 - Forks: 19

madyankin/postcss-each 📦

PostCSS plugin to iterate through values

Language: JavaScript - Size: 581 KB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 94 - Forks: 19

VisLab/EEG-Clean-Tools

Contains tools for EEG standardized preprocessing

Language: MATLAB - Size: 4.32 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 92 - Forks: 30

kharchenkolab/dropEst

Pipeline for initial analysis of droplet-based single-cell RNA-seq data

Language: C++ - Size: 47.1 MB - Last synced at: 6 months ago - Pushed at: over 3 years ago - Stars: 91 - Forks: 42

damianhorna/multi-imbalance

Python package for tackling multi-class imbalance problems. http://www.cs.put.poznan.pl/mlango/publications/multiimbalance/

Language: Python - Size: 66 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 78 - Forks: 12

nipreps/dmriprep

dMRIPrep is a robust and easy-to-use pipeline for preprocessing of diverse dMRI data. The transparent workflow dispenses of manual intervention, thereby ensuring the reproducibility of the results.

Language: Python - Size: 115 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 71 - Forks: 25

elcorto/pwtools

pwtools is a Python package for pre- and postprocessing of atomistic calculations, mostly targeted to Quantum Espresso, CPMD, CP2K and LAMMPS. It is almost, but not quite, entirely unlike ASE, with some tools extending numpy/scipy. It has a set of powerful parsers and data types for storing calculation data.

Language: Python - Size: 21.3 MB - Last synced at: 27 days ago - Pushed at: 6 months ago - Stars: 71 - Forks: 17

Yu-Group/veridical-flow

Making it easier to build stable, trustworthy data-science pipelines based on the PCS framework.

Language: Jupyter Notebook - Size: 13.4 MB - Last synced at: 23 days ago - Pushed at: almost 2 years ago - Stars: 71 - Forks: 8

keurfonluu/toughio

Pre- and post-processing Python library for TOUGH

Language: Python - Size: 18.6 MB - Last synced at: 24 days ago - Pushed at: 25 days ago - Stars: 66 - Forks: 9

ALebrun-108/BoxSERS

Python package that provides a full range of functionality to process and analyze vibrational spectra (Raman, SERS, FTIR, etc.).

Language: Jupyter Notebook - Size: 20 MB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 66 - Forks: 15

ildoonet/remote-dataloader

PyTorch DataLoader processed in multiple remote computation machines for heavy data processings

Language: Python - Size: 10.7 KB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 66 - Forks: 2

hirofumi0810/asr_preprocessing

Python implementation of pre-processing for End-to-End speech recognition

Language: Python - Size: 1.67 MB - Last synced at: over 2 years ago - Pushed at: over 7 years ago - Stars: 66 - Forks: 22

gregversteeg/gaussianize

Transforms univariate data into normally distributed data

Language: Python - Size: 121 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 63 - Forks: 24

wajuqi/Sentinel-1-preprocessing-using-Snappy

Sentinel-1 image pre-processing using snappy.

Language: Python - Size: 17.6 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 63 - Forks: 22

AlessioZanga/PyEEGLab 📦

Analyze and manipulate EEG data using PyEEGLab.

Language: Python - Size: 1.04 GB - Last synced at: 3 months ago - Pushed at: almost 5 years ago - Stars: 61 - Forks: 23

TakeLab/podium

Podium: a framework agnostic Python NLP library for data loading and preprocessing

Language: Python - Size: 2.19 MB - Last synced at: about 2 months ago - Pushed at: almost 3 years ago - Stars: 60 - Forks: 2

YuxinZhaozyx/pytorch-VideoDataset

Tools for loading video dataset and transforms on video in pytorch. You can directly load video files without preprocessing.

Language: Python - Size: 7.81 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 58 - Forks: 16

lucasrla/wsi-preprocessing

Simple library for preprocessing histopathological whole-slide images (WSI) into tiles (a.k.a. patches) towards deep learning

Language: Python - Size: 18.6 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 55 - Forks: 14

taknev83/pywedge

Makes Interactive Chart Widget, Cleans raw data, Runs baseline models, Interactive hyperparameter tuning & tracking

Language: Jupyter Notebook - Size: 9.62 MB - Last synced at: about 2 months ago - Pushed at: almost 4 years ago - Stars: 55 - Forks: 10

MASILab/PreQual

An automated pipeline for integrated preprocessing and quality assurance of diffusion weighted MRI images

Language: Python - Size: 396 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 52 - Forks: 10

olivierhagolle/Start_maja

To process a Sentinel-2 time series with MAJA cloud detection and atmospheric correction processor

Language: Python - Size: 483 MB - Last synced at: 3 months ago - Pushed at: over 5 years ago - Stars: 50 - Forks: 15

VincentStimper/mclahe

NumPy and Tensorflow implementation of the Multidimensional Contrast Limited Adaptive Histogram Equalization (MCLAHE) procedure

Language: Python - Size: 16.8 MB - Last synced at: 25 days ago - Pushed at: over 3 years ago - Stars: 49 - Forks: 6

paulross/cpip

CPIP - a C/C++ preprocessor implemented in Python.

Language: Python - Size: 37.2 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 46 - Forks: 4

nlgranger/SeqTools

A python library to manipulate and transform indexable data (lists, arrays, ...)

Language: Python - Size: 1.56 MB - Last synced at: 25 days ago - Pushed at: over 1 year ago - Stars: 46 - Forks: 4

SilentFlame/Named-Entity-Recognition

Corpus and a baseline neural network system for Named Entity Recognition in Hindi-English Code-Mixed social media text.

Language: Python - Size: 29.2 MB - Last synced at: 7 months ago - Pushed at: about 5 years ago - Stars: 45 - Forks: 16

0xferit/ITU-Turkish-NLP-Pipeline-Caller 📦

A Python3 wrapper tool to help using ITU Turkish NLP Pipeline API -- UNMAINTAINED --

Language: Python - Size: 131 KB - Last synced at: 10 days ago - Pushed at: over 7 years ago - Stars: 45 - Forks: 9

preprocessy/preprocessy

Python package for Customizable Data Preprocessing Pipelines

Language: Jupyter Notebook - Size: 993 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 44 - Forks: 14

l-ramirez-lopez/prospectr

R package: Misc. Functions for Processing and Sample Selection of Spectroscopic Data

Language: R - Size: 17.4 MB - Last synced at: 28 days ago - Pushed at: about 1 month ago - Stars: 44 - Forks: 21

bids-apps/freesurfer

BIDS app wrapping recon-all from FreeSurfer

Language: Python - Size: 223 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 43 - Forks: 35

karakurai/visual_inspection

An application for visual inspection written in Python, running on Windows, Linux, and macOS. This software enables high-performance visual inspection even with an inexpensive web camera. No GPU machine required. It is possible to automate the inspection in a factory.

Language: Python - Size: 9.21 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 42 - Forks: 13

data-science-lab-amsterdam/skippa

SciKIt-learn Pipeline in PAndas

Language: Python - Size: 423 KB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 42 - Forks: 1

OanaIgnat/I3D_Keras

I3D implemetation in Keras + video preprocessing + visualization of results

Language: Python - Size: 83 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 41 - Forks: 10

Aura-healthcare/ecg_qc

A library to compute ECG signal quality indicators

Language: Jupyter Notebook - Size: 50.4 MB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 41 - Forks: 10

TextDatasetCleaner/TextDatasetCleaner

🔬 Очистка датасетов от мусора (нормализация, препроцессинг)

Language: Python - Size: 72.3 KB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 40 - Forks: 10

ag-ds-bubble/swtloc

Python package for Stroke Width Transform - Localizing the Text (Letters & Words) in a Natural Image

Language: Python - Size: 126 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 39 - Forks: 6

ParkerICI/premessa

R package for pre-processing of mass and flow cytometry data

Language: R - Size: 247 KB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 39 - Forks: 23

Puneet2000/In-Depth-ML

In depth machine learning resources

Language: Jupyter Notebook - Size: 130 MB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 38 - Forks: 16

bids-apps/HCPPipelines

A BIDS App for minimal preprocessing using the HCP Pipelines

Language: Python - Size: 152 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 37 - Forks: 31

SIMEXP/load_confounds 📦

Load fMRIprep confounds in python

Language: Python - Size: 3.15 MB - Last synced at: about 2 months ago - Pushed at: almost 4 years ago - Stars: 37 - Forks: 12

Clearailhc/ACE2005-toolkit

Focusing on ACE 2005 data preprocessing, we provide doc-level, sentence-level and BIO-style golden data preprocessing, the only thing you need is the ACE05 row data. Hope you enjoy!😎

Language: Python - Size: 46.6 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 37 - Forks: 6

raj-sutariya/indic-num2words

Python library for converting numbers to words for all Indian Languages.

Language: Python - Size: 117 KB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 36 - Forks: 13

rachellea/ct-volume-preprocessing

End-to-end Python CT volume preprocessing pipeline to convert raw DICOMs into clean 3D numpy arrays for ML. From paper Draelos et al. "Machine-Learning-Based Multiple Abnormality Prediction with Large-Scale Chest Computed Tomography Volumes."

Language: Python - Size: 25.4 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 36 - Forks: 15

FareedKhan-dev/Most-powerful-NLP-library

Gemini, as capable as GPT-4, provides a free API with limited access. I tested it with the help of prompt engineering and found that it can solve almost any NLP task you want to tackle.

Language: Jupyter Notebook - Size: 107 KB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 35 - Forks: 9

fitushar/Brain-Tissue-Segmentation-Using-Deep-Learning-Pipeline-NeuroNet

This Repository is for the MISA Course final project which was Brain tissue segmentation. we adopt NeuroNet which is a comprehensive brain image segmentation tool based on a novel multi-output CNN architecture which has been trained and tuned using IBSR18 dataset

Language: Jupyter Notebook - Size: 5.16 MB - Last synced at: 7 months ago - Pushed at: over 5 years ago - Stars: 35 - Forks: 9

huseinzol05/Machine-Learning-Data-Science-Reuse 📦

Gathers machine learning and data science techniques for problem solving.

Language: Jupyter Notebook - Size: 38.1 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 35 - Forks: 32

fkie-cad/Logprep

log data pre processing, generation and shipping in python

Language: Python - Size: 9.73 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 34 - Forks: 10

daniellwdb/roka

🤖 Rise of Kingdoms bot to manage kingdom titles and DKP through Discord.

Language: TypeScript - Size: 35.6 MB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 34 - Forks: 16

maruedt/chemometrics

Python library for chemometric data analysis

Language: Python - Size: 37.9 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 34 - Forks: 6

NirLab-TAU/sleepeegpy

Language: Jupyter Notebook - Size: 166 MB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 33 - Forks: 12

allenai/smashed

SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batching, and more. Supports datasets from Huggingface, torchdata iterables, or simple lists of dictionaries.

Language: Python - Size: 4.56 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 33 - Forks: 5

hellosunking/Ktrim

Ktrim: an extra-fast and accurate adapter- and quality-trimmer for sequencing data

Language: C++ - Size: 336 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 32 - Forks: 7

hscspring/pnlp

NLP预/后处理工具。

Language: Python - Size: 106 KB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 31 - Forks: 6

JuliaML/MLLabelUtils.jl

Utility package for working with classification targets and label-encodings

Language: Julia - Size: 170 KB - Last synced at: 14 days ago - Pushed at: almost 4 years ago - Stars: 31 - Forks: 13

intuition-dev/INTUITION

Intuition v1. CLI for Pug, CRUD and docs/blogs as staticGen, and much more.

Size: 197 MB - Last synced at: 6 months ago - Pushed at: almost 3 years ago - Stars: 29 - Forks: 3

SudhakarKuma/Machine_Learning

A repository of resources for understanding the concepts of machine learning/deep learning. 

Language: Jupyter Notebook - Size: 615 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 29 - Forks: 26

prat96/FLIR_to_Yolo

This script converts FLIR thermal dataset annotations to YOLO format

Language: Python - Size: 16.6 KB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 29 - Forks: 4

dongyx/shsub

Fast Template Engine for Shell

Language: C - Size: 88.9 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 28 - Forks: 2

Related Topics
machine-learning 462 python 426 data-science 184 nlp 166 pandas 130 classification 113 deep-learning 113 data-visualization 82 data 81 data-analysis 80 numpy 79 python3 72 eda 72 feature-engineering 67 logistic-regression 66 sklearn 66 visualization 63 linear-regression 62 natural-language-processing 61 dataset 60 tensorflow 60 random-forest 57 scikit-learn 56 exploratory-data-analysis 56 matplotlib 55 data-cleaning 51 machine-learning-algorithms 51 clustering 49 regression 48 jupyter-notebook 47 data-mining 46 seaborn 42 sentiment-analysis 42 keras 40 image-processing 40 pytorch 38 neural-network 37 feature-extraction 35 nltk 34 pipeline 34 r 34 computer-vision 33 neural-networks 31 analysis 31 svm 31 ai 30 ml 30 artificial-intelligence 29 supervised-learning 28 cnn 28 preprocessor 28 xgboost 26 decision-trees 26 svm-classifier 25 nlp-machine-learning 24 datascience 24 pca 23 feature-selection 23 prediction 23 normalization 22 predictive-modeling 22 time-series 22 text-processing 21 streamlit 21 kaggle 21 statistics 20 knn-classification 20 text-classification 20 naive-bayes-classifier 19 preprocessing-data 19 eeg 19 opencv 18 lemmatization 18 datacleaning 18 knn 18 pca-analysis 18 tf-idf 18 text-mining 17 css 17 tokenization 17 confusion-matrix 17 tokenizer 17 kmeans-clustering 17 java 17 text 17 regression-models 16 postprocessing 16 pandas-dataframe 16 word2vec 16 unsupervised-learning 16 random-forest-classifier 15 html 15 cross-validation 15 data-preprocessing 15 kmeans 15 outlier-detection 15 mri 15 neuroimaging 15 twitter 14 hyperparameter-tuning 14