Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: text-mining

ishijo/Taylor-Swift-Lyrics

Database (.txt and .csv) of all Taylor Swift Song Lyrics upto April'23

Language: Jupyter Notebook - Size: 15.8 MB - Last synced: about 3 hours ago - Pushed: about 4 hours ago - Stars: 7 - Forks: 3

annajiat/2022-09-24-bracu-nlp

Language: HTML - Size: 2.08 MB - Last synced: about 10 hours ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

aphp/edsnlp

Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes.

Language: Python - Size: 87 MB - Last synced: about 9 hours ago - Pushed: about 18 hours ago - Stars: 97 - Forks: 27

adbar/trafilatura

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

Language: Python - Size: 23.1 MB - Last synced: about 15 hours ago - Pushed: 3 days ago - Stars: 2,965 - Forks: 228

fitria-dwi/Hoax-Detection

This project aims to build a model to predict the truth of an article, hoax or non-hoax. Apart from that, this project also wants to identify the percentage of hoax and non-hoax articles.

Language: Jupyter Notebook - Size: 4.22 MB - Last synced: 1 day ago - Pushed: 3 days ago - Stars: 2 - Forks: 0

contefranz/OpTop

Optimal topic identification from a pool of Latent Dirichlet Allocation models

Language: R - Size: 327 KB - Last synced: 1 day ago - Pushed: over 2 years ago - Stars: 10 - Forks: 0

raniavirdas/Text-Mining-First-Project

It is my group's middle project on text classification during a student exchange at Asia University, Taiwan. It uses five types of names of articles in PubMed.

Language: Jupyter Notebook - Size: 28.1 MB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 0 - Forks: 0

mesolitica/malaysian-dataset

We gather Malaysian dataset! https://malaysian-dataset.readthedocs.io/

Language: Jupyter Notebook - Size: 1.33 GB - Last synced: 2 days ago - Pushed: 3 days ago - Stars: 284 - Forks: 102

jbesomi/texthero

Text preprocessing, representation and visualization from zero to hero.

Language: Python - Size: 22.1 MB - Last synced: 2 days ago - Pushed: 9 months ago - Stars: 2,869 - Forks: 238

FatimaUriarte/Python

Python files employed in my research

Language: Jupyter Notebook - Size: 3.66 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 0 - Forks: 0

Aayshashukla/SentimentAnalysis

Twitter Sentiment Analysis using Natural Language Processing(NLP)

Language: Jupyter Notebook - Size: 9.39 MB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 0 - Forks: 0

currentslab/extractnet

A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package

Language: HTML - Size: 421 MB - Last synced: about 15 hours ago - Pushed: 5 months ago - Stars: 181 - Forks: 19

DmitryRyumin/EMNLP-2023-Papers

EMNLP 2023 Papers: Explore cutting-edge research from EMNLP 2023, the premier conference for advancing empirical methods in natural language processing. Stay updated on the latest in machine learning, deep learning, and natural language processing with code included. :star: support NLP!

Language: Python - Size: 6.43 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 78 - Forks: 3

kernel-loophole/KG-graph

Knowledge graph from unstructured text

Language: Python - Size: 2.37 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 4 - Forks: 0

chiphuyen/lazynlp

Library to scrape and clean web pages to create massive datasets.

Language: Python - Size: 37.1 KB - Last synced: 2 days ago - Pushed: over 3 years ago - Stars: 2,150 - Forks: 310

saahilk1511/Web-Analytics-and-Mining

My codes for CS 688 Web Analytics and mining

Language: Jupyter Notebook - Size: 15.3 MB - Last synced: 3 days ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

KomeijiForce/AutoPersona

自动从脏网页文本中提取角色人设信息的中文llama-3模型

Language: Python - Size: 13.7 KB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 0 - Forks: 0

nilswende/nlp-toolbox

Reimplementation of the Hagen NLPToolbox

Language: Java - Size: 27.3 MB - Last synced: 4 days ago - Pushed: about 2 years ago - Stars: 2 - Forks: 0

HanXinzi-AI/awesome-python-machine-learning-resources

a collection of awesome machine learning and deep learning Python libraries&tools. 热门实用机器学习和深入学习Python库和工具的集合

Size: 10.5 MB - Last synced: 4 days ago - Pushed: 5 days ago - Stars: 104 - Forks: 22

TeaZea/Gmail-Scraper_Word-Analysis

A small script that scrapes your gmail and creates a word analysis visualization from the contents of the queried emails.

Language: Jupyter Notebook - Size: 512 KB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 1 - Forks: 0

yogeshhk/MiningResume

Text Mining certain fields from a resume

Language: Jupyter Notebook - Size: 1.5 MB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 53 - Forks: 43

adbar/German-NLP

Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German

Size: 103 KB - Last synced: about 15 hours ago - Pushed: 5 days ago - Stars: 406 - Forks: 58

gesiscss/awesome-computational-social-science

A list of awesome resources for Computational Social Science

Language: R - Size: 170 KB - Last synced: 4 days ago - Pushed: 21 days ago - Stars: 465 - Forks: 59

JesusSalinas/master_upb

Text Analysis

Language: Python - Size: 705 KB - Last synced: 4 days ago - Pushed: 5 days ago - Stars: 0 - Forks: 0

trinker/textreadr

Tools to uniformly read in text data including semi-structured transcripts

Language: R - Size: 1.78 MB - Last synced: about 14 hours ago - Pushed: about 1 year ago - Stars: 73 - Forks: 5

lasigeBioTM/BENT

Biomedical Term Annotator

Language: Python - Size: 6.06 MB - Last synced: 4 days ago - Pushed: 5 days ago - Stars: 9 - Forks: 1

erikhoward/hgwellsr

An R data package of selected H. G. Wells novels to be used for NLP research.

Language: R - Size: 602 KB - Last synced: 5 days ago - Pushed: about 6 years ago - Stars: 4 - Forks: 0

erikhoward/grimmr

An R package for Fairy Tales by The Brothers Grimm

Language: R - Size: 4.88 KB - Last synced: 5 days ago - Pushed: about 6 years ago - Stars: 1 - Forks: 0

KMiNT21/html2sent

HTML2SENT modifies HTML to improve sentences tokenizer quality

Language: Python - Size: 44.9 KB - Last synced: 6 days ago - Pushed: almost 5 years ago - Stars: 8 - Forks: 2

klaudia-dikunow/tweets-classification

Language: Jupyter Notebook - Size: 960 KB - Last synced: 6 days ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0

Chaymae-ipynb/Text-Mining-Projects

Language: Jupyter Notebook - Size: 30.3 KB - Last synced: 5 days ago - Pushed: 6 days ago - Stars: 0 - Forks: 0

rvhonorato/cazy-parser

A way to extract specific information from CAZy

Language: Python - Size: 120 KB - Last synced: 7 days ago - Pushed: 7 months ago - Stars: 12 - Forks: 8

gengoai/gengoai

Mono Repository for GengoAI projects

Language: Java - Size: 14.7 MB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 2 - Forks: 0

antoniooliveira03/Projects

Projects I have worked during my Bachelor

Language: Jupyter Notebook - Size: 18.1 MB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 0 - Forks: 0

lisc-tools/lisc

Literature Scanner: Automated collection & analyses of the scientific literature.

Language: Python - Size: 6.21 MB - Last synced: 6 days ago - Pushed: about 1 month ago - Stars: 88 - Forks: 11

ncbi-nlp/PubMed-Best-Match

Machine-learning based pipeline relying on LambdaMART currently used in PubMed for relevance (Best Match) searches

Language: Python - Size: 2.63 MB - Last synced: 8 days ago - Pushed: about 6 years ago - Stars: 38 - Forks: 11

garygsw/twitter-crowd-flow-prediction

Crowd flow prediction model

Language: Python - Size: 644 MB - Last synced: 8 days ago - Pushed: about 6 years ago - Stars: 9 - Forks: 1

dimeji-kazeem/text-analytics Fork of oladimeji-kazeem/text-analytics

The ultimate solution for text summarization. Whether you're a student looking to condense lengthy research papers, a professional needing to digest complex reports, or just someone who wants to get to the essence of an article quickly, SummarizeMaster has got you covered.

Language: Python - Size: 3.19 MB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 0 - Forks: 0

caufieldjh/awesome-bioie

🧫 A curated list of resources relevant to doing Biomedical Information Extraction (including BioNLP)

Size: 595 KB - Last synced: 4 days ago - Pushed: about 1 year ago - Stars: 308 - Forks: 32

ArdentEmpiricist/text_analysis

Analyze text stored as *.txt in chosen file or directory. Doesn't read files in subdirectories. Counting all words and then searching for every unique word in the vicinity (+-5 words).

Language: Rust - Size: 173 KB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 2 - Forks: 0

laugustyniak/awesome-sentiment-analysis

Repository with all what is necessary for sentiment analysis and related areas

Size: 36.1 KB - Last synced: 2 days ago - Pushed: 6 months ago - Stars: 525 - Forks: 108

ranja-sarkar/document

PDF files can be read using various python library packages viz., tabula, pdfplumber etc. Here I've defined a class to parse files from a directory and save/store their information using pdfplumber in an output file.

Language: Jupyter Notebook - Size: 218 KB - Last synced: 9 days ago - Pushed: 9 days ago - Stars: 0 - Forks: 0

lfoppiano/document-qa

Scientific Document Insight Q/A

Language: Python - Size: 597 KB - Last synced: 10 days ago - Pushed: 10 days ago - Stars: 16 - Forks: 3

brainhack-school2020/koudyk_bhs_project

A Python package that creates a visualization the use of methods in citation networks over time.

Language: Jupyter Notebook - Size: 43.6 MB - Last synced: 10 days ago - Pushed: almost 4 years ago - Stars: 2 - Forks: 0

aminkhod/Search-engine

Practice to implement a simple news search engine

Language: Jupyter Notebook - Size: 258 KB - Last synced: 10 days ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

aminkhod/PersainTextClusteringWithHazm

I use Request Psychological Advice texts in Persian. I clean data and prepare it with the Hazm project. Then cluster them by using Genetic_Kmeans Algorithm and compare results with normal Kmeans and Birch Algorithms.

Language: Jupyter Notebook - Size: 19 MB - Last synced: 10 days ago - Pushed: about 4 years ago - Stars: 3 - Forks: 0

disi-unibo-nlp/POIROT

POIROT: Phenomena Explanation from Text. Unsupervised learning of interpretable and statistically significant knowledge.

Language: Jupyter Notebook - Size: 52.6 MB - Last synced: 10 days ago - Pushed: over 1 year ago - Stars: 1 - Forks: 1

mcs07/ChemDataExtractor

Automatically extract chemical information from scientific documents

Language: Python - Size: 542 KB - Last synced: 7 days ago - Pushed: 10 months ago - Stars: 283 - Forks: 112

raminrahimzada/az-corpus-nlp

Dataset Materials , NLP for Azerbaijan language

Size: 807 MB - Last synced: 9 days ago - Pushed: 9 months ago - Stars: 9 - Forks: 4

alisafaya/txt-from-pdf

Extracting clean text from pdfs using pdfminer.six and pypdf.

Language: Python - Size: 23.4 KB - Last synced: 10 days ago - Pushed: 11 days ago - Stars: 1 - Forks: 0

PetrKorab/Arabica

Python package for exploratory text data analysis

Language: Python - Size: 102 MB - Last synced: 10 days ago - Pushed: 10 days ago - Stars: 58 - Forks: 14

pusztaipatrik/job-postings

Results of a Data analytics project at TH Wildau. Created with Orange data analytics tool, Data source: https://www.kaggle.com/datasets/PromptCloudHQ/us-jobs-on-monstercom

Size: 11.5 MB - Last synced: 11 days ago - Pushed: 11 days ago - Stars: 0 - Forks: 0

jakelever/biotext

Get a nicely-chunked local copy of the biomedical literature (to use for other projects)!

Language: Python - Size: 210 KB - Last synced: 11 days ago - Pushed: 11 days ago - Stars: 13 - Forks: 5

agusnieto77/ACEP

Análisis Computacional de Eventos de Protesta (ACEP). Computer-Aided Protest Event Analysis (CAPEA)

Language: R - Size: 106 MB - Last synced: 9 days ago - Pushed: 2 months ago - Stars: 8 - Forks: 2

MIT-LCP/bloatectomy

A python package for removing duplicate text in clinical notes or other documents

Language: TeX - Size: 7.48 MB - Last synced: 9 days ago - Pushed: almost 4 years ago - Stars: 32 - Forks: 9

stepthom/text_mining_resources

Resources for learning about Text Mining and Natural Language Processing

Size: 707 KB - Last synced: 3 days ago - Pushed: over 1 year ago - Stars: 553 - Forks: 200

bookieio/breadability

Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)

Language: HTML - Size: 604 KB - Last synced: 12 days ago - Pushed: 12 days ago - Stars: 203 - Forks: 26

kavgan/nlp-in-practice

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

Language: Jupyter Notebook - Size: 91.8 MB - Last synced: 7 days ago - Pushed: over 3 years ago - Stars: 1,120 - Forks: 781

oroszgy/awesome-hungarian-nlp

A curated list of NLP resources for Hungarian

Size: 110 KB - Last synced: 2 days ago - Pushed: 7 months ago - Stars: 208 - Forks: 18

Kirscher/TextMining_Parcours_de_soin

Offical repo of the paper "A novel methodological framework for the analysis of health trajectories and survival outcomes in heart failure patients" (ICLR 2024)

Language: HTML - Size: 39.3 MB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 1 - Forks: 1

kmk4842/opus2021

Translating financial lexicons to other languages via WordNet

Language: Python - Size: 851 KB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 0 - Forks: 0

LMU-Seminar-LLMs/TopicGPT

TopicGPT allows to integrate the benefits of LLMs into Topic Modelling

Language: Python - Size: 14 MB - Last synced: 11 days ago - Pushed: 8 months ago - Stars: 15 - Forks: 1

nika-akin/EC-Web-Scrapping-and-Text-Mining

Documentation for crawling, parsing contents of web page and analysing opinions on AI and overview of methods

Language: HTML - Size: 293 MB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 0 - Forks: 0

juliasilge/tidytext

Text mining using tidy tools :sparkles::page_facing_up::sparkles:

Language: R - Size: 129 MB - Last synced: 6 days ago - Pushed: about 1 month ago - Stars: 1,159 - Forks: 181

sp1thas/ceid-thesis 📦

Thesis report and implementation for feature extraction based on geographical origin of the author

Language: Python - Size: 1.09 MB - Last synced: 14 days ago - Pushed: about 3 years ago - Stars: 2 - Forks: 0

sharmilathirumalai/TF-IDF

IR implemented by using TF-IDF method

Language: Java - Size: 10.7 MB - Last synced: 14 days ago - Pushed: over 4 years ago - Stars: 0 - Forks: 0

moamenibrahim/nlp-teaching

Testing files for Natural language processing course projects @university_of_oulu

Language: Python - Size: 5.84 MB - Last synced: 14 days ago - Pushed: 15 days ago - Stars: 0 - Forks: 1

qminer/qminer

Analytic platform for real-time large-scale streams containing structured and unstructured data.

Language: C++ - Size: 39.6 MB - Last synced: 12 days ago - Pushed: about 1 year ago - Stars: 219 - Forks: 57

navigating-stories/orange-story-navigator

Add-on to the Orange3 data mining toolkit with text processing widgets from the project Navigating Stories

Language: Python - Size: 8.24 MB - Last synced: 13 days ago - Pushed: 14 days ago - Stars: 0 - Forks: 2

mathsyouth/awesome-text-summarization Fork of lipiji/App-DL

A curated list of resources dedicated to text summarization

Size: 243 KB - Last synced: 3 days ago - Pushed: over 1 year ago - Stars: 1,531 - Forks: 267

mkk-1817/Youtube-Comments-Sentiment-Analysis

Size: 1.95 KB - Last synced: 14 days ago - Pushed: 14 days ago - Stars: 0 - Forks: 0

vmenger/deduce

Deduce: de-identification method for Dutch medical text

Language: Python - Size: 7.21 MB - Last synced: 4 days ago - Pushed: 22 days ago - Stars: 48 - Forks: 19

pkourdis/gateplugin-SUTime

GATE plugin to annotate documents with TIMEX3 tags using the SUTime library.

Language: Java - Size: 138 KB - Last synced: 15 days ago - Pushed: over 6 years ago - Stars: 2 - Forks: 0

graphbrain/graphbrain

Language, Knowledge, Cognition

Language: Python - Size: 103 MB - Last synced: 15 days ago - Pushed: 15 days ago - Stars: 564 - Forks: 62

sergioburdisso/pyss3

A Python package implementing a new interpretable machine learning model for text classification (with visualization tools for Explainable AI :octocat:)

Language: Python - Size: 102 MB - Last synced: 9 days ago - Pushed: 9 months ago - Stars: 332 - Forks: 44

Lambda-3/DiscourseSimplification

Extension of the SentenceSimplification project

Language: Java - Size: 1.47 MB - Last synced: 14 days ago - Pushed: 15 days ago - Stars: 53 - Forks: 13

bilalhassankhan007/ML-Movie-recommendation-System

Movie recommedation system deploy on Heroku

Language: Jupyter Notebook - Size: 1.8 MB - Last synced: 16 days ago - Pushed: 16 days ago - Stars: 0 - Forks: 0

atse0612/Data-Science-Capstone

This is the repository for the final course in the Data Science Specialization on Coursera.

Language: HTML - Size: 607 KB - Last synced: 16 days ago - Pushed: over 6 years ago - Stars: 0 - Forks: 1

DivyaSharma0795/AppleVisionPro_Dataset

Sentiment analysis of Apple Vision Pro tweets using multiple models

Language: Jupyter Notebook - Size: 15.2 MB - Last synced: 16 days ago - Pushed: 17 days ago - Stars: 1 - Forks: 0

abhishek-kathuria/Reddit-Graph-Network

Identify the toxic and harmful subreddit groups for US elections dataset using Graph Data Structure and Data Mining

Language: Jupyter Notebook - Size: 3.22 MB - Last synced: 18 days ago - Pushed: about 2 years ago - Stars: 1 - Forks: 0

johnmatzakos/automatic-labeling-of-text-data

Algorithms For Automatic Labelling Of Text Data. A Text Mining project that studies Supervised and Semi-Supervised Learning on Twitter data.

Language: R - Size: 703 KB - Last synced: 19 days ago - Pushed: over 3 years ago - Stars: 1 - Forks: 0

johnmatzakos/detect-fake-news-machine-learning

Detecting fake news using a range of classic machine learning algorithms

Language: Jupyter Notebook - Size: 810 KB - Last synced: 19 days ago - Pushed: about 2 years ago - Stars: 0 - Forks: 0

greenelab/snorkeling

Extracting biomedical relationships from literature with Snorkel 🏊

Language: Jupyter Notebook - Size: 326 MB - Last synced: 19 days ago - Pushed: over 3 years ago - Stars: 58 - Forks: 17

hetio/medline

Computing term cooccurrence in MEDLINE

Language: Jupyter Notebook - Size: 139 MB - Last synced: 19 days ago - Pushed: about 3 years ago - Stars: 16 - Forks: 4

danielvartan/iramuteqlike

💬⛏️ Tools to reproduce the IRaMuTeQ software analyses in R

Language: R - Size: 2.78 MB - Last synced: 20 days ago - Pushed: 8 months ago - Stars: 7 - Forks: 1

s3nh/markov_chain

Simple markov chain for tweet generator

Language: Python - Size: 63.5 KB - Last synced: 20 days ago - Pushed: over 5 years ago - Stars: 1 - Forks: 0

storopoli/topic-modeling-workshop

Slides for Topic Modeling Workshop

Language: R - Size: 3.47 MB - Last synced: 20 days ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

bayuik/nlp_tensorflow

sentyment analysis with NLP and tensorflow. Multiclass text classification

Language: Jupyter Notebook - Size: 3.91 MB - Last synced: 20 days ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

mgg39/AI-that-recognizes-fake-news

LSTM network designed to recognize Fake news from "True" news

Language: Python - Size: 8.79 KB - Last synced: 20 days ago - Pushed: over 3 years ago - Stars: 1 - Forks: 0

fuchsia-programming/scrape 📦

When you need those jobs hypersonic 🚀 scrape 🔪

Language: JavaScript - Size: 2.79 MB - Last synced: 20 days ago - Pushed: over 4 years ago - Stars: 10 - Forks: 3

omarsar/text_mining_lab_2017

Requirements for Text Mining Summer Course (Lab Session)

Language: Jupyter Notebook - Size: 14.3 MB - Last synced: 20 days ago - Pushed: almost 7 years ago - Stars: 5 - Forks: 4

omarsar/omarsar.github.io

My Blog - Research and Life Experience. :speech_balloon:

Language: HTML - Size: 35.1 MB - Last synced: 20 days ago - Pushed: 7 months ago - Stars: 3 - Forks: 0

omarsar/friendly_nlp

Mini blog for notes and guides on Natural Language Processing (Open Notes)

Language: HTML - Size: 2.13 MB - Last synced: 20 days ago - Pushed: over 6 years ago - Stars: 3 - Forks: 1

omarsar/friendly_data_science

Material and resources for the "Friendly Data Science" YouTube series.

Language: Jupyter Notebook - Size: 2.84 MB - Last synced: 20 days ago - Pushed: over 6 years ago - Stars: 4 - Forks: 1

omarsar/dm_2018_hw_1

Holds instructions for assignment 1 of the Data Mining course

Language: Jupyter Notebook - Size: 10.7 KB - Last synced: 20 days ago - Pushed: over 5 years ago - Stars: 2 - Forks: 41

lfoppiano/SuperMat

Superconductors material dataset

Language: Jupyter Notebook - Size: 20.2 MB - Last synced: 15 days ago - Pushed: 6 months ago - Stars: 23 - Forks: 2

apoorvalal/manual-ngrams-newspapers

code to construct google ngram style diagrams using text scraped from pdfs of a newspaper archive

Language: Jupyter Notebook - Size: 119 MB - Last synced: 20 days ago - Pushed: over 3 years ago - Stars: 1 - Forks: 3

news-r/webhoserx 📦

📝Feature extraction extension for webhoser

Language: R - Size: 30.3 KB - Last synced: 20 days ago - Pushed: almost 5 years ago - Stars: 2 - Forks: 1

news-r/phrasenets

Create Phrase Networks

Language: R - Size: 2.97 MB - Last synced: 20 days ago - Pushed: over 4 years ago - Stars: 3 - Forks: 0

JohnCoene/sacred

📖 Sacred texts in R

Language: R - Size: 39.1 MB - Last synced: 20 days ago - Pushed: about 5 years ago - Stars: 21 - Forks: 6

Related Keywords
text-mining 1,736 nlp 397 python 340 machine-learning 297 natural-language-processing 269 text-classification 230 r 221 sentiment-analysis 198 text-analysis 168 data-science 150 topic-modeling 133 data-mining 104 text-processing 102 nlp-machine-learning 83 deep-learning 79 python3 67 nltk 64 classification 62 information-retrieval 55 text 49 clustering 49 data-visualization 48 tf-idf 47 twitter 47 rstats 42 data-analysis 40 wordcloud 37 word2vec 37 visualization 36 webscraping 35 named-entity-recognition 33 java 33 spacy 33 web-scraping 33 dataset 31 lda 31 keyword-extraction 30 jupyter-notebook 28 information-extraction 27 twitter-api 26 sentiment-classification 25 artificial-intelligence 24 pandas 24 naive-bayes-classifier 23 logistic-regression 22 latent-dirichlet-allocation 22 word-embeddings 21 bag-of-words 21 r-package 20 ai 20 scikit-learn 20 neural-network 20 gensim 19 text-analytics 19 unsupervised-learning 19 bioinformatics 18 pubmed 18 network-analysis 18 analysis 18 news 18 tidytext 18 random-forest 17 tweets 17 javascript 17 regex 17 data 17 tensorflow 17 sklearn 17 summarization 16 covid-19 16 digital-humanities 16 search-engine 16 tokenization 16 crawler 16 machine-learning-algorithms 16 scraping 16 twitter-sentiment-analysis 15 social-media 15 numpy 15 corpus-linguistics 15 flask 15 corpus 15 ner 14 tokenizer 14 neural-networks 14 social-network-analysis 14 image-processing 14 sentiment 14 text-clustering 14 shiny 14 cosine-similarity 14 feature-extraction 14 pytorch 14 api 14 bert 14 natural-language-understanding 14 preprocessing 14 tidyverse 13 word-cloud 13 opinion-mining 13