An open API service providing repository metadata for many open source software ecosystems.

GitHub / HTLinh0604 / Programming_Project_Clustering

An unsupervised clustering analysis of over 57,000 GitHub projects using their README.md text. This study compares traditional keyword methods (TF-IDF) against modern semantic embeddings (Sentence-BERT) to automatically group repositories. Results show that BERT combined with HDBSCAN provides far more meaningful and coherent topic clusters.

JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HTLinh0604%2FProgramming_Project_Clustering
PURL: pkg:github/HTLinh0604/Programming_Project_Clustering

Stars: 1
Forks: 1
Open issues: 0

License: None
Language: Jupyter Notebook
Size: 2.34 MB
Dependencies parsed at: Pending

Created at: 18 days ago
Updated at: 9 days ago
Pushed at: 9 days ago
Last synced at: 9 days ago

Topics: bert, clustering, dbscan, hdbscan, kmeans, machine-learning, python, spectral-clustering, tf-idf

    Loading...