An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: document-clustering

FrancescoPaoloL/LearningNLP

This repository contains what I'm learning about NLP

Language: Python - Size: 12.2 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 3 - Forks: 0

Hazim-HF/Unstructured-Data-Analysis

This repository focuses on methods for compiling, summarizing, and analyzing unstructured and semi-structured data, including text, images, and audio. The course covers algorithms and techniques for mining and exploring unstructured data using suitable tools and packages. Applications such as sentiment analysis, document clustering, and information

Language: HTML - Size: 12.2 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

AnFreTh/STREAM

A versatile Python package engineered for seamless topic modeling, topic evaluation, and topic visualization. Ideal for text analysis, natural language processing (NLP), and research in the social sciences, STREAM simplifies the extraction, interpretation, and visualization of topics from large, complex datasets.

Language: Python - Size: 228 MB - Last synced at: 25 days ago - Pushed at: 4 months ago - Stars: 38 - Forks: 9

taki0112/Vector_Similarity

Python, Java implementation of TS-SS called from "A Hybrid Geometric Approach for Measuring Similarity Level Among Documents and Document Clustering"

Language: Python - Size: 1.81 MB - Last synced at: 12 days ago - Pushed at: over 5 years ago - Stars: 298 - Forks: 44

sidmishraw/scp

A data processing pipeline for text-mining on contents extracted from PDFs using Apriori and Simplicial Complex algorithms

Language: C++ - Size: 268 MB - Last synced at: 3 months ago - Pushed at: over 7 years ago - Stars: 3 - Forks: 2

sneha-rangole/D3js-Document-Cluster-Visualizer

This frontend application is part of the Document Clustering and Visualization project, designed to provide an interactive user interface for clustering documents. It enables users to visualize document similarities and explore clustering results dynamically.

Language: JavaScript - Size: 209 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 3 - Forks: 0

surajiyer/multi-view-clustering-ensemble

Multi-view document clustering via ensemble method [https://link.springer.com/article/10.1007/s10844-014-0307-6]

Language: Python - Size: 4.88 KB - Last synced at: 2 months ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 0

simondelarue/Graph-based_Novel_Clustering

Clustering novels thanks to their characters interaction's graph structure :books:

Language: Jupyter Notebook - Size: 13.8 MB - Last synced at: 11 months ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

rohanag03/Document-Clustering-Topic-Modeling

This project applies K-means and LDA to the Twenty Newsgroups dataset to group similar documents and discover underlying topics. Explore clustering and topic modeling techniques for organizing and understanding text data.

Language: Jupyter Notebook - Size: 18.1 MB - Last synced at: 7 days ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

atlijas/citizens_document_clustering

Language: Python - Size: 9.3 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 2

KaribDev/Fine-grained-Clustering

Agglomerative Clustering of articles

Language: Jupyter Notebook - Size: 7.47 MB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

CynthiaKoopman/Short-Document-Clustering-NLP

Published Article - The Effect of Preprocessing on Short Document Clustering

Language: Jupyter Notebook - Size: 670 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 3 - Forks: 2

div5yesh/information-retrieval

Explores information retrieval techniques.

Language: Python - Size: 835 KB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 2

sethuiyer/Document-Clusterer

Document clustering using PCA from scratch using numpy and scipy.

Language: Python - Size: 4.7 MB - Last synced at: about 1 month ago - Pushed at: almost 9 years ago - Stars: 3 - Forks: 6

ethanhezhao/MIGA

MIGA is a short text clustering/aggregation topic model that leverages document metadata

Language: MATLAB - Size: 400 KB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

probinso/IR-cluster-rank-demo Fork of dfm/flask-d3-hello-world

Information Retrieval - Cluster Rank Demo Harness

Language: Python - Size: 1010 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

Siddharth1989/DocumentClusteringForCryptocurrencyInfoDocumentSet

This project implements document clustering with the EM (Expectation-Maximization) algorithm for a Cryptocurrency Information Document Set.

Language: Jupyter Notebook - Size: 1.89 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

HotelTango314/cs5293sp23-project2

The 3rd of 4 NLP Projects - this project clusters a corpus of culinary recipe texts. The cuisine of each recipe is known and each cluster is labeled with the majority cuisine in that cluster. New recipes are then introduced and clustered and labeled with the cuisine of the closest cluster.

Language: Python - Size: 1.73 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

SyedMuhammadFaheem/InformationRetrieval

This repo consists of all the assignments, projects, tasks of Information Retrieval course of FAST NUCES Spring 2023.

Language: Python - Size: 587 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

nunososorio/docxmatch

DocxMatch is a Streamlit app that analyzes the similarity between Word files.

Language: Python - Size: 43.9 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 1

FranzTscharf/DBPRO-DokCluster

Development of a Document Clustering System with carrot2 and elasticsearch

Language: JavaScript - Size: 166 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 1

kaustubhn/doc_clust

Document clustering with word vectors.

Language: Jupyter Notebook - Size: 14.6 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 2 - Forks: 5

ttavni/2D_Text_Clustering

Using word embeddings, TFIDF and text-hashing to cluster and visualise text documents

Language: Python - Size: 8.15 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 14 - Forks: 5

maxoodf/tgnews

Telegram Data Clustering Contest (Bossy Gnu's submission )

Language: C++ - Size: 41 KB - Last synced at: 2 months ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 2

romanglo/multiple-writing-style-detector

This project implements a solution of detecting numerous writing styles in a text.

Language: Python - Size: 756 KB - Last synced at: over 2 years ago - Pushed at: almost 6 years ago - Stars: 8 - Forks: 2

Wittline/document-clustering

Agglomerative Hierarchical Document Clustering

Language: Python - Size: 8.79 KB - Last synced at: 3 months ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

arashshams/Food_Recipes_Document_Clustering

This repository hosts an unsupervised model for Document Clustering of food recipes.

Language: Jupyter Notebook - Size: 1.12 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

sorayutmild/Unsupervised-Thai-Document-Clustering-with-Sanook-news

An unsupervised model to clustering Thai news. Using TD-IDF, SimCSE-WangchanBERTa with weighted by number of named entities as a vector representation, and using k-means as an clustering model.

Language: Jupyter Notebook - Size: 54.6 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

lukacupic/PDF-Document-Management-and-Search-System

Bachelor's Thesis at FER, University of Zagreb, 2018.

Language: Java - Size: 56 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 1

vincent10400094/news-classification

Final project for the course "EE4037 Introduction to Digital Speech Processing" 2020 fall.

Language: Python - Size: 8.14 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 0

thisishardik/forum-posts-clustering

This project incorporates Hierarchical document clustering of the Kaggle forum posts using data from Meta Kaggle. Includes fine-tuned vectors using GoogleNews embeddings.

Language: Jupyter Notebook - Size: 6.78 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

steven-s/minhash-document-clusters

Minhash clustering of text documents

Language: Scala - Size: 33.2 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 4 - Forks: 1

SpringerNLP/Chapter5

Chapter 5: Embeddings

Language: Jupyter Notebook - Size: 190 MB - Last synced at: almost 2 years ago - Pushed at: almost 6 years ago - Stars: 8 - Forks: 4

bobye/acl2017_document_clustering

code for "Determining Gains Acquired from Word Embedding Quantitatively Using Discrete Distribution Clustering" ACL 2017

Language: Python - Size: 9.77 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 21 - Forks: 6

KiriteeGak/document-clustering-pso

Language: Python - Size: 17.6 KB - Last synced at: over 2 years ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 3

adhiiisetiawan/document-clustering

Document clustering system for thesis document using Self Organizing Maps algorithm

Language: Python - Size: 3.93 MB - Last synced at: 3 months ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

jaygshah/CSE-573-Final-Project-Document-Clustering-and-Visualization Fork of Kunal30/Document-Clustering-and-Visualization

Github Repo for CSE 573 project : Document Clustering and 3D Visualization

Language: HTML - Size: 74.6 MB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 2

Shashwat4K/Clustering-Documents

Cluster documents based on various similarity measures. The project is based on 'Bag of Words' data from UCI Machine Learning reporitory

Language: Jupyter Notebook - Size: 3.92 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 1

mbilalakmal/InformationRetreivalA3

Language: Python - Size: 21.5 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

metinsay/docluster

Open Source NLP Library

Language: Python - Size: 1.45 MB - Last synced at: about 2 years ago - Pushed at: almost 8 years ago - Stars: 2 - Forks: 2

opencasestudies/ocs-twitter-vaccination-text-mining

text data analysis: differentiating anit- and pro-vaccination tweets

Language: HTML - Size: 10.5 MB - Last synced at: about 2 months ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 1

etcart/RIVet-C

Language: C - Size: 103 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

chrisPiemonte/bachelor-thesis

Bachelor's thesis about Web Graph Clustering with Word Embeddings

Language: TeX - Size: 31.7 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 2

hailiang-wang/apache-mahout Fork of apache/mahout

mvn -Dhadoop2.version=2.5.0 -Dlucene.version=xxx -DskipTests clean install

Language: Java - Size: 59.4 MB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

hailiang-wang/AffinityPropagation Fork of jincheng9/AffinityPropagation

C++ Implementation for Affinity Propagation

Language: C++ - Size: 215 KB - Last synced at: about 1 year ago - Pushed at: about 11 years ago - Stars: 0 - Forks: 1

cxd/text_svd

A SVD example application to text.

Language: HTML - Size: 10.7 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

Related Keywords
document-clustering 46 clustering 11 word2vec 8 nlp 8 text-mining 6 python 5 machine-learning 5 natural-language-processing 4 nlp-machine-learning 4 lda 4 data-science 4 tf-idf 4 clustering-algorithm 4 unsupervised-learning 3 document-similarity 3 svd 2 k-means-clustering 2 kmeans-clustering 2 bachelor-thesis 2 cosine-similarity 2 information-retrieval 2 sentiment-analysis 2 topic-modeling 2 data-visualization 2 glove 2 plagiarism-detection 2 data-mining 2 huggingface-transformers 1 name-entity-recognition 1 unsupervised-machine-learning 1 streamlit 1 sentence-embeddings 1 thai-nlp 1 latent-semantic-analysis 1 document-classification 1 embeddings-word2vec 1 fine-tuning 1 forum-posts 1 google-news 1 word-docs 1 carrot 1 carrot2 1 carrot2-plugin 1 dbpro-dokcluster 1 elasticsearch 1 kibana 1 linux 1 multilingual 1 wordvectors 1 computational-social-science 1 d3js 1 dimensionality-reduction 1 text-clustering 1 text-features 1 text-processing 1 umap 1 cpp 1 document-embedding 1 telegram 1 document-categorization 1 writing-styles-detection 1 food-recipes 1 heirarchical-clustering 1 tsne-plot 1 similarity-measures 1 uci-machine-learning 1 knn-classification 1 mutual-information 1 classification 1 language 1 numpy 1 education 1 regex 1 regular-expressions 1 rstats 1 sentiment 1 tidytext 1 twitter-api 1 vaccines 1 crawling 1 web-graph-clustering 1 web-mining 1 webgraph 1 qna 1 question-answering 1 cosine-distance 1 kaggle-dataset 1 natural 1 spectral-clustering 1 word2vec-algorithm 1 locality-sensitive-hashing 1 lsh 1 minhash 1 minhash-lsh-algorithm 1 glove-embeddings 1 sense2vec 1 word-embeddings 1 word-sense-disambiguation 1 word-similarity 1 d2-clustering 1