GitHub topics: hdbscan
gagolews/genieclust
Genie: Fast and Robust Hierarchical Clustering with Noise Point Detection - in Python and R
Language: C++ - Size: 79.2 MB - Last synced at: about 17 hours ago - Pushed at: 4 days ago - Stars: 62 - Forks: 11

mhahsler/dbscan
Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms - R package
Language: C++ - Size: 9.39 MB - Last synced at: 3 days ago - Pushed at: about 2 months ago - Stars: 328 - Forks: 64

arborx/ArborX
Performance-portable geometric search library
Language: C++ - Size: 4.94 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 204 - Forks: 43

yihong1120/Construction-Hazard-Detection
Enhances construction site safety using YOLO for object detection, identifying hazards like workers without helmets or safety vests, and proximity to machinery or vehicles. HDBSCAN clusters safety cone coordinates to create monitored zones. Post-processing algorithms improve detection accuracy.
Language: Python - Size: 48.6 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 291 - Forks: 25

frasertheking/umap
Repository for nonlinear dimensionality reduction of precipitation microphysics
Language: Jupyter Notebook - Size: 28.8 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

doxakis/HdbscanSharp
HDBSCAN in C#
Language: C# - Size: 2.52 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 36 - Forks: 5

petabi/petal-clustering
DBSCAN, HDBSCAN, and OPTICS clustering algorithms.
Language: Rust - Size: 112 KB - Last synced at: 6 days ago - Pushed at: 21 days ago - Stars: 32 - Forks: 5

nanxstats/elden-ring-boss-clustering
Cluster analysis of Elden Ring bosses (pre-DLC)
Language: R - Size: 195 KB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

Tonathiu-Pina/Classify_galaxies_with_Unsupervised_and_Supervised_learning
Classification of galaxies with unsupervised and supervised learning
Language: Jupyter Notebook - Size: 76.2 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

daniel-furman/awesome-chatgpt-prompts-clustering
Text clustering: HDBSCAN is probably all you need.
Language: Jupyter Notebook - Size: 18.8 MB - Last synced at: 20 days ago - Pushed at: over 1 year ago - Stars: 19 - Forks: 2

NeuralClassifier/ParameterFree-HDBSCAN-Outlier-Detection
This repository includes the code for manuscript, "Unsupervised Parameter-free Outlier Detection using HDBSCAN* Outlier Profiles", published in IEEE BigData 2024
Language: Python - Size: 39.1 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 1

arj1211/cluster-links
pipeline that extracts, cleans, embeds, and clusters web links into topical groups using text extraction, semantic keyword extraction, and unsupervised clustering
Language: Python - Size: 34.2 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

hansalemaos/cyhdbscan
Very fast hdbscan for Python - written in Cython/C++
Language: C++ - Size: 19.5 KB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

drob-xx/TopicTuner
HDBSCAN Tuning for BERTopic Models
Language: Python - Size: 82.6 MB - Last synced at: 4 days ago - Pushed at: almost 2 years ago - Stars: 45 - Forks: 1

namespaiva/pi-acidentes
Análise de Dados dos Acidentes de Trânsito no município de Santos: Tendências e Características. Um estudo sobre acidentes veiculares na cidade de Santos.
Language: Jupyter Notebook - Size: 486 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

nabeel-oz/qlik-py-tools
Data Science algorithms for Qlik implemented as a Python Server Side Extension (SSE).
Language: Python - Size: 132 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 187 - Forks: 87

aidendorian/Spotify-Song-Recommendation
Recommends songs from dataset of 232K songs from Spotify. Uses HDBSCAN and Siamese Network. An ML Project
Language: Python - Size: 0 Bytes - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

sian0x0/Roud-Song-Clusters
Lyrics clustering
Language: Jupyter Notebook - Size: 84.6 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 0

pranava007/AI_ML_HDBSCAN_Clutering
HDBSCAN
Language: Jupyter Notebook - Size: 45.9 KB - Last synced at: 3 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

anurima-saha/Topic_Modelling_LDA_HDBSCAN
Using unsupervised learning to group reddit text and identify major conspiracy theories using NLP, LDA, spacy, SVD, SBert embedding and HDBSCAN.
Language: HTML - Size: 4.21 MB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

chris-santiago/bookmarks-topics
Using unsupervised learning and language modeling to cluster and reorganize web bookmarks.
Language: Jupyter Notebook - Size: 290 KB - Last synced at: 3 days ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

domingosdeeulariadumba/OnlineRetailSalesClustering
Sales clustering and evaluation for U.K.-based online retail company.
Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

guglielmosanchini/ClustViz
Visualization of many Clustering Algorithms, via Notebook or GUI
Language: Jupyter Notebook - Size: 246 MB - Last synced at: 11 days ago - Pushed at: about 4 years ago - Stars: 22 - Forks: 14

FrancoBobadilla/NeuralMap
NeuralMap is a data analysis tool based on Self-Organizing Maps
Language: Python - Size: 4.99 MB - Last synced at: 11 days ago - Pushed at: almost 4 years ago - Stars: 9 - Forks: 2

kstrassheim/active-learning-with-deep-learning-for-nlp
We present our concept of a new type of Active-Learning for Deep Learning with NLP text classification and experimentally prove its performance against Random Sampling as well as its runtime performance on the Security Threat dataset from CySecAlert. These new Active Learning algorithms are based on Sentence-BERT and BERTopic clustering algorithms with allow us to generate fixed length tokens for whole sentences to make them comparable to each other. Further the Tokens are Clustered using K-Means or HDBScan to get diverse clusters to pick the samples out of them.
Language: Jupyter Notebook - Size: 7.51 MB - Last synced at: about 2 months ago - Pushed at: almost 3 years ago - Stars: 6 - Forks: 0

dlab-berkeley/Unsupervised-Learning-in-R
Workshop (6 hours): Clustering (Hdbscan, LCA, Hopach), dimension reduction (UMAP, GLRM), and anomaly detection (isolation forests).
Language: R - Size: 472 KB - Last synced at: 29 days ago - Pushed at: almost 5 years ago - Stars: 47 - Forks: 12

dcarpintero/taxonomy-completion
Taxonomy Completion with Embedding Quantization and an LLM-based Pipeline: A Case Study in Computational Linguistics
Language: Jupyter Notebook - Size: 10.7 MB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 3 - Forks: 0

DecafSunrise/SimpleTopicModel
Easily identifying themes in text
Language: Jupyter Notebook - Size: 1.37 MB - Last synced at: 7 days ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

NehaPant14/Density-based-clustering
Density based clustering
Language: Jupyter Notebook - Size: 459 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 2 - Forks: 0

MuzzyB/Exploring-Cybersecurity-Data-Science
Exploring Cybersecurity Data Science: Dimensionality Reduction and Cluster Analysis
Language: Jupyter Notebook - Size: 50 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

Palinody/FFCL
FFCL: Flexible and (probably not the) fast(est) c++ clustering library.
Language: C++ - Size: 1.53 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 1

arubiales/dbscan
Fast explication of DBSCAN and HDBSCAN
Language: Jupyter Notebook - Size: 438 KB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 1

gulabpatel/Machine-Learning
Regression, Classification, Clustering, Dimension-reduction, Anomaly detection
Language: Jupyter Notebook - Size: 23.5 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 2

abouhadid/Information-Retrieval
Language: Jupyter Notebook - Size: 416 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

rs-anderson/Clustering-Ward-Level-Poverty-Using-Satellite-Imagery
Combing satellite imagery and machine learning methods to cluster ward-level povery in Gauteng, South Africa.
Language: Jupyter Notebook - Size: 85.7 MB - Last synced at: 3 months ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 1

dmeoli/OnlineRetail
Data Mining project 2020/2021 @ University of Pisa
Language: Jupyter Notebook - Size: 235 MB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 11 - Forks: 4

Karthick47v2/efficient-hdbscan
Fast parallel implementation of HDBSCAN
Language: C - Size: 415 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 1

colurw/wiki_abstracts_NLP
Document-level semantic clustering. Unsupervised topic modelling.
Language: Python - Size: 1.26 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

srsawant34/G26_P7-Document_Clustering_Summarization_Visualization
Document Clustering, Summarisation and Visualisation on 20NewsGroup
Language: Jupyter Notebook - Size: 165 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

amoustakis/Supervised-and-Unsupervised-Machine-Learning-projects
Supervised Machine Learning (GNB, Knn, LR, MLP & SVM) in the dataset philippines and Unsupervised Machine Learning (k-means, HAC, GMM, DBSCAN, HDBSCAN & SOM) in the datasets wingnut & h2mg_128_90
Language: Jupyter Notebook - Size: 786 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

edo-pasto/Parallel-Flexible-Clustering
The thesis presents the parallelisation of a state-of-the art clustering algorithm, FISHDBC. This objective has been achived by improving the main data structures and components of the algorithm: HNSW, MST and HDBSCAN. My contribution is based on a lock-free strategy, completely wrote in Python.
Language: Python - Size: 5.81 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

jeongwhanchoi/TrackML-Particle-Tracking
High Energy Physics Particle Tracking in CERN Detectors
Language: Jupyter Notebook - Size: 6.04 MB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 4 - Forks: 1

annm802/tech-and-the-economic-cycle
Using BERTopic to show the path of technological advancements in the different phases of the economic cycle (January 2005- January 2023).
Language: Jupyter Notebook - Size: 83.3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

hansalemaos/locatecolorcluster
Lightning-fast image color clustering with C-based RGB localization/euclidean distance calculation. Supports DBSCAN/HDBSCAN, Shapely geometry.
Language: Python - Size: 86.9 KB - Last synced at: 18 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

XifeiNi/TrackML
High Energy Physics particle tracking in CERN detectors
Language: Python - Size: 14.4 MB - Last synced at: 6 months ago - Pushed at: almost 7 years ago - Stars: 6 - Forks: 1

edwardrha/Korean-NLP-Project
NLP on Korean news articles. Automatic topic extraction through dynamic clustering.
Language: Python - Size: 1.16 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 12 - Forks: 4

juste97/topic-modeling-pipeline
Pipeline leveraging UMAP and HDBSCAN with BERTopic for large datasets.
Language: Jupyter Notebook - Size: 87.8 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

kennedyCzar/EIGEN-FREQUENCY-CLUSTERING-USING-KMEANS-DBSCAN-PCA-HDBSCAN
EIGEN FREQUENCY CLUSTERING USING [KMEANS] [KMEANS & PCA ] [DBSCAN] [HDBSCAN]
Language: Python - Size: 253 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 6

shayneobrien/text-cluster
Offline and online (i.e., real-time) annotated clustering methods for text data.
Language: Jupyter Notebook - Size: 15.8 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 9 - Forks: 5

digamjain/Clustering-Geolocation-Data-Intelligently
My learning outcomes and followup of a well instructed Coursera guided project by Ari Anastassiou.
Language: Jupyter Notebook - Size: 934 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 0

rufinag/GeoLocation-Clustering
Language: Jupyter Notebook - Size: 1.36 MB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

yksnilowyrahcaz/Product_Reviews_Analysis
Using TFIDF, UMAP, and HDBSCAN to analyze product reviews
Language: Python - Size: 77.5 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

huyndao/sg1-topic-modeling
A fun Topic Modeling Project of the TV show Stargate SG1
Language: Jupyter Notebook - Size: 95.8 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

EtzionR/generate-Convex-Hull-SHP-from-HDBSCAN-clustering-probabilities
Defines a boundary around cluster centers in a given point-layer shapefile.
Language: Python - Size: 7.5 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 4 - Forks: 3

keerti2001/Density-Based-Place-Clusterig-Using-Geo-Social-Data
Implementation of Density-based clustering algorithms for Geo-social data
Language: Jupyter Notebook - Size: 22.7 MB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 1

NeuralClassifier/CORE-SG
Core Spanning Graph published in ICDE 2022
Language: Python - Size: 14.5 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

NeuralClassifier/HDBSCAN-OutlierDetect
Investigating different hierarchies in HDBSCAN* for outlier detection using GLOSH
Language: Jupyter Notebook - Size: 318 MB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 1

rohanmohapatra/hdbscan-cpp
Fast and Efficient Implementation of HDBSCAN in C++ using STL
Language: C++ - Size: 7.55 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 35 - Forks: 8

MariuszAndziak/Personality_and_Its_Transformations
Summary and knowledge destilation of prof. Jordan Peterson's YouTube lectures on Personality and Its Transformations using different methods of information retrieval.
Language: HTML - Size: 45.3 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

dbrookeUAB/hdbscanR
Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN)
Language: R - Size: 53.7 KB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

pajaskowiak/dbcv
Density-Based Clustering Validation
Language: MATLAB - Size: 356 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

connor-mccarthy/nlp-visualization-of-statistical-learning-book
📙 End-to-end NLP and data visualization pipeline of the text from a machine learning textbook.
Language: HTML - Size: 1.19 MB - Last synced at: 2 months ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 0

wangyiqiu/hdbscan
A Fast Parallel Algorithm for HDBSCAN* Clustering
Language: C++ - Size: 94.7 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 26 - Forks: 6

tharunchitipolu/Clustering-geolocation-data-with-python
We have taxi rank locations, and want to define key clusters of these taxis where we can build service stations for all taxis operating in that region.
Language: Jupyter Notebook - Size: 1.34 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 6 - Forks: 0

MiguelHeCa/tfm-nlp
Repository for the Final Project of the MIRI
Language: Jupyter Notebook - Size: 111 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

kochlisGit/Tensorflow-MNIST-State-Of-The-Art
Building High Performance Convolutional Neural Networks with TensorFlow
Language: Jupyter Notebook - Size: 32.9 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 1

guglielmosanchini/ClustVizGUI
GUI version of https://github.com/guglielmosanchini/ClustViz
Language: HTML - Size: 245 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 2

EtzionR/Clustering-by-Silhouette
Optimize clustering labels using Silhouette Score.
Language: Python - Size: 24.4 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 8 - Forks: 2

madkehl/Auto2Cluster
Not html; graphs are rendered in html and are large hence the language tag. This is a python repo. Contains code for selecting keywords, vectorizing them and compressing using an autoencoder, then clustering this compressed space. Not all data is currently available for public use
Language: Jupyter Notebook - Size: 71.2 MB - Last synced at: 3 months ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

fredriko/metacurate-regularly
Finding the top news stories of 2022 among 54,000+ news on AI, ML, NLP, data science and related fields.
Language: HTML - Size: 10.8 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

AVoss84/invoice_topics
Topic modelling of invoice data
Language: HTML - Size: 11.8 MB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

ttavni/SemanticWordClouds
Making word clouds more interesting
Language: Python - Size: 8.21 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 4

ciCciC/MasterThesisPartialRDFschemaRetrieval
Master Thesis: Partial RDF Schema Retrieval
Language: Jupyter Notebook - Size: 19.1 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

antoniocavalcante/mustache
Language: CSS - Size: 92 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 14 - Forks: 1

afunTW/geo-separation
Using HDBSCAN and Voronori algorithm to create your own spatial polygon.
Language: Jupyter Notebook - Size: 2.32 MB - Last synced at: 21 days ago - Pushed at: over 5 years ago - Stars: 5 - Forks: 2

jeandsantos/italian_olive_oil
Clustering of Italian Olive Oils with their Fatty Acid Composition
Language: HTML - Size: 1.85 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

JoaoCampagnolo/Behav_clustering_thesis
Repository for my master thesis project on Unsupervised behavioral classification with 3D pose data from tethered Drosophila Melanogaster.
Language: Jupyter Notebook - Size: 25.7 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 4 - Forks: 1

Timo9Madrid7/maliciousfl
Size: 2.83 GB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

kochlisGit/Data-Science-Algorithms
Implementation of statistics algorithms for Machine Learning & Data Mining. The algorithms were implemented with the Scikit-Learn Library
Language: Python - Size: 877 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 3

UpwardTrajectory/meander-maker
Find dense clusters for Theme-Walks or Topic Exploration with HDBSCAN and GoogleMaps API
Language: JavaScript - Size: 5.59 MB - Last synced at: 2 months ago - Pushed at: almost 6 years ago - Stars: 6 - Forks: 4

abhinav-chakravarty/clustering-geolocation-data-intelligently
Clustering Geolocation Data Intelligently in Python
Language: Jupyter Notebook - Size: 212 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

maha-prathamesh/Clustering-Geolocation-Data
Taking Taxi rank location data for Johannesburg, South Africa and clustering them geographically optimally, so that we can build service stations for all taxi ranks in that cluster.
Language: Jupyter Notebook - Size: 1.18 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

GabrielMissael/solution
Solución al reto BBVA Contigo, Hack BBVA 2021
Language: Python - Size: 28.1 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 1

ahenoch/Masterthesis
Results of the thesis for the M.Sc. Bioinformatics program at the Friedrich Schiller University Jena.
Language: Jupyter Notebook - Size: 865 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

rajtulluri/Taxi-rank-Geoclustering
Geo clustering of Taxi rank locations to find optimal locations for service centers to be setup
Language: Jupyter Notebook - Size: 3.36 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 1

luthra2059/Clustering-Geolocation-Data-Intelligently-in-Python
Here we use a real life taxi rank location data-set of the city of Johannesburg, South Africa. We try to pinpoint the locations to build service centers to accommodate as many taxis as possible with the help of clustering.
Language: Jupyter Notebook - Size: 2.5 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0
