Topic: "text-clustering"
jbesomi/texthero
Text preprocessing, representation and visualization from zero to hero.
Language: Python - Size: 22.1 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 2,904 - Forks: 240

xlang-ai/instructor-embedding
[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Language: Python - Size: 170 MB - Last synced at: 23 days ago - Pushed at: 4 months ago - Stars: 1,933 - Forks: 146

murray-z/text_analysis_tools
中文文本分析工具包(包括- 文本分类 - 文本聚类 - 文本相似性 - 关键词抽取 - 关键短语抽取 - 情感分析 - 文本纠错 - 文本摘要 - 主题关键词-同义词、近义词-事件三元组抽取)
Language: Python - Size: 9.98 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 533 - Forks: 114

RandyPen/TextCluster
短文本聚类预处理模块 Short text cluster
Language: Python - Size: 1.25 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 238 - Forks: 60

Edward1Chou/textClustering
Language: Jupyter Notebook - Size: 3.51 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 129 - Forks: 52

plkmo/NLP_Toolkit
Library of state-of-the-art models (PyTorch) for NLP tasks
Language: Python - Size: 4.3 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 108 - Forks: 27

sidphbot/Auto-Research
Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!
Language: Python - Size: 429 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 57 - Forks: 7

LMU-Seminar-LLMs/TopicGPT
TopicGPT allows to integrate the benefits of LLMs into Topic Modelling
Language: Python - Size: 14 MB - Last synced at: 3 days ago - Pushed at: 11 months ago - Stars: 25 - Forks: 3

KeremZaman/semantic-sh
semantic-sh is a SimHash implementation to detect and group similar texts by taking power of word vectors and transformer-based language models (BERT).
Language: Python - Size: 40 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 23 - Forks: 3

Tikquuss/meta_XLM
Cross-lingual Language Model (XLM) pretraining and Model-Agnostic Meta-Learning (MAML) for fast adaptation of deep networks
Language: Jupyter Notebook - Size: 32.7 MB - Last synced at: 4 days ago - Pushed at: about 4 years ago - Stars: 20 - Forks: 4

trinker/clustext
Easy, fast clustering of texts
Language: R - Size: 732 KB - Last synced at: about 1 month ago - Pushed at: about 8 years ago - Stars: 18 - Forks: 3

ArikReuter/TopicGPT
TopicGPT allows to integrate the benefits of LLMs into Topic Modelling
Language: Python - Size: 15.3 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 14 - Forks: 3

JayKumarr/OSDM
This code belongs to ACL conference paper entitled as "An Online Semantic-enhanced Dirichlet Model for Short Text Stream Clustering"
Language: Python - Size: 831 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 14 - Forks: 4

ttavni/2D_Text_Clustering
Using word embeddings, TFIDF and text-hashing to cluster and visualise text documents
Language: Python - Size: 8.15 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 14 - Forks: 5

1997alireza/QA-Clustering
Implementation of some algorithms for text clustering
Language: Python - Size: 75.2 KB - Last synced at: 12 months ago - Pushed at: over 6 years ago - Stars: 14 - Forks: 1

Navy10021/SLS
SLS : Neural Information Retrieval(IR)-based Semantic Search model
Language: Jupyter Notebook - Size: 1.83 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 13 - Forks: 4

pemagrg1/sentence-clustering
Sentence Clustering and visualization. Created Date: 25 Apr 2018
Language: Python - Size: 85.9 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 13 - Forks: 9

chrisPiemonte/url2vec
Graph clustering and Node embeddings with word2vec
Language: Python - Size: 159 MB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 13 - Forks: 7

durgeshsamariya/awesome-clustering-resources
Clustering related books and research papers.
Size: 41 KB - Last synced at: 5 days ago - Pushed at: over 4 years ago - Stars: 12 - Forks: 1

alaradirik/TR-NLP-workshop
2020 Açık Seminer - Turkish NLP workshop
Language: Jupyter Notebook - Size: 8.43 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 12 - Forks: 3

ThorstenDoherr/searchengine
heuristic matching of large databases by fuzzy criteria like addresses
Language: xBase - Size: 87.9 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 11 - Forks: 1

SpringerNLP/Chapter3
Chapter 3: Text and Speech Basics
Language: Jupyter Notebook - Size: 6.88 MB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 11 - Forks: 6

eigenfoo/reddit-clusters
Understanding hateful subreddits through text clustering
Language: Python - Size: 80.4 MB - Last synced at: 22 days ago - Pushed at: over 6 years ago - Stars: 11 - Forks: 2

VIDA-NYU/domain_discovery_API 📦
Domain Discovery Operations API formalizes the human domain discovery process by defining a set of operations that capture the essential tasks that lead to domain discovery on the Web as we have discovered in interacting with the Subject Matter Experts (SME)s.
Language: Python - Size: 1.23 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 8 - Forks: 4

Navy10021/Parallel_Clustering_based_TM
Parallel clustering-based Topic Modeling
Language: Python - Size: 3.28 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 7 - Forks: 1

sowmyagowri/Text-Clustering
Python Program for Text Clustering using Bisecting k-means
Language: Jupyter Notebook - Size: 2.88 MB - Last synced at: 12 months ago - Pushed at: over 7 years ago - Stars: 7 - Forks: 8

gulabpatel/Transformers
Language: Jupyter Notebook - Size: 931 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 6 - Forks: 2

pemagrg1/Magic-Of-TFIDF
TFIDF being the most basic and simple topic in NLP, there's alot that can be done using TFIDF only! So, in this repo, I'll be adding the blog, TFIDF basics, wonders done using tfidf etc.
Language: Jupyter Notebook - Size: 465 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 6 - Forks: 2

MNoorFawi/text-kmeans-clustering-with-python
simple text clustering using kmeans algorithm
Language: Python - Size: 3.91 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 6 - Forks: 3

Narius2030/Find-Similar-Vietnamese-Texts
This project build a classification model for topics of news. With the target is automatically recognize suitable topic (class) to a random article. There are two architectures implemented which are LSTM and Hybrid models
Language: Jupyter Notebook - Size: 264 MB - Last synced at: 26 days ago - Pushed at: 11 months ago - Stars: 5 - Forks: 0

alexiszamanidis/news_articles_text_mining
News Articles Text Classification and Clustering using Machine Learning in Python. Also, KNN implementation from scratch using max heap.
Language: Jupyter Notebook - Size: 30.9 MB - Last synced at: 4 days ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 4

pokarats/gsdmm
Gibbs Sampling Dirichlet Multinomial Model (GSDMM) for Short-Text Clustering
Language: Python - Size: 4.51 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 5 - Forks: 1

scionoftech/TopicModeling_and_Text_Clustering
Topic Modeling and Text Cluster Analysis
Language: Jupyter Notebook - Size: 192 KB - Last synced at: 30 days ago - Pushed at: over 5 years ago - Stars: 5 - Forks: 1

Dennis1989/textClustPy
This is an implementation of the TextClust algorithm in Python 3.
Language: Python - Size: 165 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 2

sharmaroshan/Text-Clustering
It is a very different task, as here I am going to cluster 200 different texts related to games and sports in 2 or more different clusters. we can also use zipf plot to determine how many useful clusters can be formed.
Language: Jupyter Notebook - Size: 495 KB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 4 - Forks: 5

goodman1204/Discovering-Topic-Representative-Terms-for-Short-Text-Clustering
Source code for "Discovering Topic Representative Terms for Short Text Clustering (IEEE Access)"
Language: Python - Size: 3.86 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

borisfoko/Spark-Text-Clustering
Text clustering in spark with scala using LDA Model on a TF-IDF matrix
Language: Scala - Size: 52.8 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 0

tychen5/IR_TextMining
Information Retrieval project implementation
Language: Jupyter Notebook - Size: 8.96 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 0

AFAgarap/ag-news-ae-clustering
Using an Autoencoder to encode features for k-Means Clustering on the AG News Dataset
Language: Python - Size: 27.6 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 2 - Forks: 0

bhattbhavesh91/texthero-demo
Tutorial to demonstrate the power of Texthero which is a library used for Text preprocessing, representation and visualization from zero to hero.
Language: Jupyter Notebook - Size: 511 KB - Last synced at: about 2 months ago - Pushed at: almost 4 years ago - Stars: 2 - Forks: 0

souravs17031999/MicrosoftBing-search-query-prediction
Analysis and Visualizations for COVID-19 Bing search engine queries + Classifier pipeline for predicting country based on search query.
Language: HTML - Size: 11.8 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

tiansztiansz/python-data-science
b站 AI日日新 不定期更新使用Python框架完成机器学习、深度学习、数据科学任务
Language: Jupyter Notebook - Size: 3.42 MB - Last synced at: about 11 hours ago - Pushed at: about 12 hours ago - Stars: 1 - Forks: 0

till-tietz/gsdmm
GSDMM Short Text Clustering via Dirichlet Mixture Models
Language: C++ - Size: 1.28 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 1

DrKenReid/Generalized-Analysis-of-Text-Data
A comprehensive toolkit for analyzing text data using various AI and NLP techniques, including topic modeling, sentiment analysis, and text classification, demonstrated on the 20 Newsgroups dataset.
Language: Jupyter Notebook - Size: 1.45 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

jwchoi95/matsciexp
Official source codes for implementing "Quantitative Topic Analysis of Materials Science Literature Using Natural Language Processing"
Language: Python - Size: 3.13 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 1

binhetech/text_clustering
Text Clustering 文本聚类
Language: Python - Size: 16.6 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

JayKumarr/OSGM
This code belongs to paper entitled "An Online Semantic-enhanced Graphical Model for Evolving Short Text Stream Clustering"
Language: Python - Size: 838 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Savaw/MIR-project
Implementation of an information retrieval system, an error correction system, a crawler, and some classification and clustering algorithms on text date
Language: Jupyter Notebook - Size: 27.9 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 1

mwritescode/text-categorization-with-WEKA
This repository contains the data used for our paper 'Text categorization with WEKA: a survey'.
Size: 216 KB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

kimdanny/Olfactory-NLP
Language: Jupyter Notebook - Size: 21.7 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 1

intellisol/amazon_nlp
Amazon Fine Foods Review
Language: Jupyter Notebook - Size: 1.18 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 1

aqstack/Document-Clustering
Document Clustering using bisecting K-Means algorithm.
Language: Jupyter Notebook - Size: 2.33 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

bluella/Text-clusterization-overview
This project is created to test different text vectorization techniques in order to perform further clusterization..
Language: Python - Size: 7.33 MB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 0

olha-kaminska/sentence_similarity
NLP tools for sentence similarity, text classification, text clusterization etc.
Language: Jupyter Notebook - Size: 107 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

DannyMerkx/Text_Clustering
Ant colony optimisation algorithm for text clustering
Language: Matlab - Size: 1.03 MB - Last synced at: over 1 year ago - Pushed at: almost 8 years ago - Stars: 1 - Forks: 1

SkywardAI/hackathon-leaderboard
Automated Leaderboard System for Hackathon Evaluation Using Large Language Models
Language: JavaScript - Size: 238 KB - Last synced at: 28 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

pngo1997/Document-Clustering-using-K-Means
Performs unsupervised clustering on text documents.
Language: Jupyter Notebook - Size: 1.36 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

michabirklbauer/hgb_dse_text_mining 📦
Contents for the practical part of the lecture Text Mining
Language: Jupyter Notebook - Size: 60.1 MB - Last synced at: 21 days ago - Pushed at: 6 months ago - Stars: 0 - Forks: 3

milican04/Data-Mining-Poetry-Dataset
Language: Jupyter Notebook - Size: 7.29 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Mohana-Murugan/NLP
NLP
Language: Jupyter Notebook - Size: 4.44 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

vectorkoz/my-nlp
Unsupervised learning. My Natural Language Processing project on Topic Extraction and Text Clustering.
Language: Jupyter Notebook - Size: 12.2 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

saaadiqh/NLP-Learning_Analytics
Developing Natural Language Processing tools to enhance Learning Analytics. Creating an automated dashboard that diagnoses strengths and weaknesses from educational data.
Language: Python - Size: 3.07 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

Viper373/Chengdu-Emotion
网易云音乐《成都》评论的文本聚类与情感分析
Language: Jupyter Notebook - Size: 21.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

RimTouny/Enhancing-Gutenberg-Book-Clustering-using-Advanced-NLP-Techniques
Text clustering, an unsupervised ML technique in NLP, groups similar texts based on content. Techniques like hierarchical, k-means, or density-based clustering categorize unstructured data, unveiling insights and patterns in diverse datasets. This exploration was part of the NLP course in my University of Ottawa master's program in 2023.
Language: Jupyter Notebook - Size: 1.81 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

LinggarM/Movie-Synopsis-Text-Clustering
Movie Synopsis Text Clustering using K-Means Clustering and TF-IDF Vectorizer and deployment using framework Flask
Language: Jupyter Notebook - Size: 49.1 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

melliottgithub/text-analytics
Text analytics projects using NLTK, Scikit-learn and others
Language: Jupyter Notebook - Size: 1.42 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Purushothaman-natarajan/NLP-TEXT-PROCESSING
This project offers advanced techniques in text preprocessing, word embeddings, and text classification. Explore methods like Word2Vec and GloVe, and master Multinomial Naive Bayes for accurate predictions. Dive into the world of text clustering and conquer challenges like unbalanced data.
Language: Jupyter Notebook - Size: 7.38 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

michabirklbauer/hgb_dse_text_mining_solutions 📦
Solutions for the practical part of the lecture Text Mining
Language: HTML - Size: 38.1 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

juman-j/text_clustering_Word2Vec_FastText
Text clustering. KNN and hierarchical algorithms.
Language: Jupyter Notebook - Size: 2.04 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

lucoliv23/Data-Analysis-with-Human-Generated-Text
Clustering of human generated text
Language: Jupyter Notebook - Size: 940 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

imis-lab/inpoint-ai-backend
The AI backend server for the inPOINT project.
Language: Python - Size: 105 KB - Last synced at: 11 months ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 1

Kairixir/URL-domain-name-clustering
Bachelor's thesis project with potential applications in Computer security
Language: Jupyter Notebook - Size: 41.7 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

RodolfoLSS/wine_analysis
Data analysis of a wine's dataset.
Language: Jupyter Notebook - Size: 270 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

elsdes3/text-clustering
Clustering stackexchange posts using text processing and unsupervised machine learning
Language: Jupyter Notebook - Size: 132 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

angel870326/Hierarchical-Clustering
hierarchical clustering for texts, scatter plots, and word clouds
Language: Jupyter Notebook - Size: 40.3 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

SaikatPhys/latent-space-text-clustering
Clustering sentences/text snippets by employing Sentence Embeddings (e.g., BERT, Universal Sentence Encoder etc) & various clustering algorithms like T-SNE, K-Means etc.
Language: Jupyter Notebook - Size: 62.5 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 1

97varun/restaurant-clustering
Cluster restaurants based on their menu items
Language: Jupyter Notebook - Size: 417 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

emeraldrains/tech_review Fork of BillyZhaohengLi/tech_review
Survey of Text Clustering Methods
Size: 244 KB - Last synced at: 5 months ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

agus2121/Clustering--PPKM-Tweet
Indonesian Text Clustering about PPKM(Pemberlakuan Pembatasan Kegiatan Masyarakat)
Language: Jupyter Notebook - Size: 1.86 MB - Last synced at: 4 months ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

CoreDotToday/CoreDotText
Python Text Mining Library
Language: Python - Size: 71.3 KB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 1

CrispenGari/RE-python
💎 Regular expression in python.
Language: Jupyter Notebook - Size: 69.3 KB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

fatihkykc/Text_clustering-classification
heuristic Implementations of text clustering and classification with k-means and naive bayes
Language: Jupyter Notebook - Size: 6.74 MB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

shrutibhutaiya/text-clustering
Text Clustering by K-Means
Language: Jupyter Notebook - Size: 249 KB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

ocramz/rational-kernels
Extending kernel methods to variable-length sequences
Language: Haskell - Size: 10.7 KB - Last synced at: about 2 months ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 1

fukuchancat/internship-coding-tasks
https://github.com/da-recruiting/internship-coding-tasks
Size: 11.2 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0
