GitHub / michaelscutari / protclust
protclust is a Python library for protein sequence analysis that integrates MMseqs2 for fast clustering and provides tools for creating robust machine learning datasets. It offers cluster-aware data splitting to prevent sequence similarity bias in model evaluation, along with comprehensive protein embedding capabilities for feature generation.
JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michaelscutari%2Fprotclust
Stars: 1
Forks: 0
Open issues: 0
License: mit
Language: Python
Size: 354 KB
Dependencies parsed at: Pending
Created at: 3 months ago
Updated at: about 2 months ago
Pushed at: about 2 months ago
Last synced at: 25 days ago
Topics: bioinformatics, clustering, computational-biology, data-preprocessing, dataset-creation, machine-learning, mmseqs2, protein-analysis, protein-sequences, sequence-embeddings, train-test-split