GitHub topics: text-mining

Repositories

deanmalmgren/textract

extract text from any document. no muss. no fuss.

Language: HTML - Size: 4.31 MB - Last synced at: about 13 hours ago - Pushed at: 5 months ago - Stars: 4,120 - Forks: 626

degenNovice/corpus-tfidf-analyzer

A Python tool for text analysis using TF-IDF, lemmatization, stopword filtering, and frequency visualization.

Language: Python - Size: 16.6 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

Ne0bliviscaris/Job-Search-Tool

Organizer for job searching across multiple sites. Fetch offers, measure recruitment progress, collect info about potential employer

Language: Python - Size: 4.69 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 3 - Forks: 0

Paulanerus/TextExplorer

A tool designed for the exploration, analysis, and comparison of textual data variants.

Language: Kotlin - Size: 492 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 2 - Forks: 0

palladian/palladian

Palladian is a Java-based toolkit with functionality for text processing, classification, information extraction, and data retrieval from the Web.

Language: Java - Size: 274 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 38 - Forks: 10

gesiscss/awesome-computational-social-science

A list of awesome resources for Computational Social Science

Language: R - Size: 209 KB - Last synced at: about 10 hours ago - Pushed at: about 1 month ago - Stars: 678 - Forks: 84

LatiefDataVisionary/text-mining-and-natural-language-processing-college-task

Language: Jupyter Notebook - Size: 12.6 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

notready155/whatsapp-chat-analysis

This project involves analyzing WhatsApp chat data to extract valuable insights. Using Python and various libraries like Pandas and Matplotlib, the project processes and visualizes chat statistics such as message frequency, most active participants, and sentiment analysis.

Size: 1.95 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2 - Forks: 0

Neplex/ArchiTXT

ArchiTXT is an open source Python library that transforms unstructured text into structured, searchable, and AI-ready data. It enables automated database generation and seamless data integration.

Language: Python - Size: 5.21 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 3 - Forks: 0

keon/awesome-nlp

:book: A curated list of resources dedicated to Natural Language Processing (NLP)

Size: 541 KB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 17,145 - Forks: 2,607

jinhangjiang/textregress

TextRegress is a Python package designed to help researchers perform advanced regression analysis on long-form text data.

Language: Python - Size: 82 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 6 - Forks: 1

vmenger/deduce

Deduce: de-identification method for Dutch medical text

Language: Python - Size: 7.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 56 - Forks: 23

kevv1m/tikara

The metadata and text content extractor for almost every file type.

Size: 1000 Bytes - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

caiohutis/Steam-Game-Review-Analysis-Using-NLP-and-Clustering

This project uses Natural Language Processing (NLP) and Machine Learning techniques to analyze user reviews of top-selling games on the Steam platform. The goal is to detect bug-related reviews using keyword filtering, assess user sentiment (positive, neutral, negative), and group similar games using clustering methods.

Language: Jupyter Notebook - Size: 1.98 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

ujjwalkarn/DataScienceR

a curated list of R tutorials for Data Science, NLP and Machine Learning

Language: R - Size: 15.7 MB - Last synced at: about 9 hours ago - Pushed at: about 2 years ago - Stars: 2,042 - Forks: 891

biolab/orange3-text

🍊 :page_facing_up: Text Mining add-on for Orange3

Language: Python - Size: 46.5 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 132 - Forks: 86

george-gca/ai_papers_cleaner

Extract text from papers PDFs and abstracts, and remove uninformative words.

Language: Python - Size: 397 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 5 - Forks: 0

deweylab/MetaSRA-pipeline

MetaSRA: normalized sample-specific metadata for the Sequence Read Archive

Language: Python - Size: 27.5 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 43 - Forks: 14

blueprints-for-text-analytics-python/blueprints-text

Jupyter notebooks for our O'Reilly book "Blueprints for Text Analysis Using Python"

Language: Jupyter Notebook - Size: 164 MB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 256 - Forks: 147

mesejo/trrex

Efficient string matching with regular expressions

Language: Python - Size: 440 KB - Last synced at: about 1 hour ago - Pushed at: 6 days ago - Stars: 143 - Forks: 6

Lilykos/pyphonetics

A Python 3 phonetics library.

Language: Python - Size: 21.5 KB - Last synced at: about 2 hours ago - Pushed at: about 5 years ago - Stars: 132 - Forks: 20

SoaresAlisson/sto

operation with strings and other facilities

Language: R - Size: 645 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

Saeidhoseinipour/ELBMcoclust

We unified some latent block models by proposing a flexible ELBM that is extended to SELBM to address the sparse problem by revealing a diagonal structure from sparse datasets. This leads to obtain more homogeneous co-clusters and therefore produce useful, ready-to-use and easy-to-interpret results.

Language: Python - Size: 19.4 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

PetrKorab/Arabica

Python package for text mining of time-series data

Language: Python - Size: 102 MB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 73 - Forks: 14

hiDaDeng/cntext

text analysis, supporting multiple methods including word count, readability, document similarity, sentiment analysis, Word2Vec/GloVe, and Large Language Models (LLMs).文本分析包，支持字数统计、可读性、文档相似度、情感分析在内的多种文本分析方法。

Language: Python - Size: 64 MB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 338 - Forks: 30

SoaresAlisson/txtnet

{txtnet} a package to build graphs from text

Language: HTML - Size: 2.89 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

mesolitica/malaysian-dataset

We gather Malaysian dataset! https://malaysian-dataset.readthedocs.io/

Language: Jupyter Notebook - Size: 1.36 GB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 315 - Forks: 111

juliasilge/tidytext

Text mining using tidy tools :sparkles::page_facing_up::sparkles:

Language: R - Size: 129 MB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 1,185 - Forks: 182

docwire/docwire

DocWire SDK: Award-winning modern data processing in C++20. SourceForge Community Choice & Microsoft support. AI-driven processing. Supports nearly 100 data formats, including email boxes and OCR. Boost efficiency in text extraction, web data extraction, data mining, document analysis. Offline processing is possible for security and confidentiality

Language: C++ - Size: 35.8 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 83 - Forks: 18

JasonKessler/scattertext

Beautiful visualizations of how language differs among document types.

Language: Python - Size: 39.4 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 2,294 - Forks: 291

stepthom/text_mining_resources

Resources for learning about Text Mining and Natural Language Processing

Size: 707 KB - Last synced at: 4 days ago - Pushed at: about 2 years ago - Stars: 577 - Forks: 199

Hords01/Data_Mining

TF-IDF Calculation

Language: Python - Size: 37 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

ropensci/jstor

Import journal data from DfR (JSTOR)

Language: R - Size: 6.14 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 47 - Forks: 10

ArdentEmpiricist/text_analysis

Analyze text stored as *.txt in chosen file or directory. Doesn't read files in subdirectories. Counting all words and then searching for every unique word in the vicinity (+-5 words).

Language: Rust - Size: 213 KB - Last synced at: 7 days ago - Pushed at: 12 days ago - Stars: 5 - Forks: 1

inaridiy/webforai

The best HTML to Markdown library, A esm-native & Useful Utilities with simple, lightweight and epic quality.

Language: TypeScript - Size: 3.5 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 65 - Forks: 5

openaire/iis

Information Inference Service of the OpenAIRE system

Language: Java - Size: 71.8 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 20 - Forks: 11

adbar/trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

Language: Python - Size: 33.8 MB - Last synced at: 13 days ago - Pushed at: about 2 months ago - Stars: 4,170 - Forks: 290

cindyoff/AI-detection-system

Supervised learning model built to detect AI from a text

Language: Python - Size: 445 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

ssrishtix/IMDB-Sentiment

A comparative case study on stemming vs lemmatization using IMDb movie reviews, focusing on NLP preprocessing and vocabulary analysis.

Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

MatoYing/TextMining

一个比较全面的文本挖掘过程。包含了利用机器学习和文本挖掘技术完成情感分析模型搭建；利用情感极性判断与程度计算来判断情感倾向；利用词频和TF-IDF挖掘出正负文本中的关键点情况；利用文本挖掘相关算法找到平台中用户讨论的集中点。

Language: Jupyter Notebook - Size: 26.8 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 31 - Forks: 2

stdlib-js/nlp-lda

Latent Dirichlet Allocation via collapsed Gibbs sampling.

Language: JavaScript - Size: 2.58 MB - Last synced at: 11 days ago - Pushed at: 15 days ago - Stars: 9 - Forks: 0

mcs07/ChemDataExtractor

Automatically extract chemical information from scientific documents

Language: Python - Size: 542 KB - Last synced at: 1 day ago - Pushed at: almost 2 years ago - Stars: 325 - Forks: 119

aphp/edsnlp

Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes.

Language: Python - Size: 121 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 123 - Forks: 31

urtx13/Four-Phase-seed

This repository contains the seed-frozen version (seed=1405) of the original statistical pipeline described in Cho 2025a. All scripts, data and results have been made reproducible for verification and independent replication.

Language: Python - Size: 190 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

mtworth/sectext

Interface for text analytics of SEC 10-K filings

Language: Python - Size: 57.6 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 3 - Forks: 0

rmillikin/fast_km Fork of stewart-lab/fast_km

A Containerized KinderMiner / Serial KinderMiner Server

Language: Python - Size: 16.3 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

stephenhky/PyShortTextCategorization

Various Algorithms for Short Text Mining

Language: Python - Size: 111 MB - Last synced at: 7 days ago - Pushed at: 17 days ago - Stars: 470 - Forks: 72

juliasilge/janeaustenr

An R Package for Jane Austen's Complete Novels :orange_book:

Language: R - Size: 4.78 MB - Last synced at: 3 days ago - Pushed at: over 2 years ago - Stars: 96 - Forks: 22

hpham1295/Impact-Data-Mining

An approach to extracting and summarizing key infrastructure and community impact information from wind disaster reconnaissance reports using Zero-shot text classification with BART-large models, highlighted by keywords.

Language: Jupyter Notebook - Size: 85.5 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

jonaschn/awesome-topic-models

✨ Awesome - A curated list of amazing Topic Models (implementations, libraries, and resources)

Size: 53.7 KB - Last synced at: 4 days ago - Pushed at: almost 3 years ago - Stars: 94 - Forks: 8

oroszgy/awesome-hungarian-nlp

A curated list of NLP resources for Hungarian

Size: 125 KB - Last synced at: 16 days ago - Pushed at: 28 days ago - Stars: 245 - Forks: 18

notesjor/CorpusExplorer.Terminal.Console

Erlaubt anderen Programmen/Programmiersprachen den Zugriff auf Analysen/Daten des CorpusExplorer v2.0

Language: C# - Size: 668 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 7 - Forks: 0

alirezatheh/perke

A keyphrase extractor for Persian

Language: Python - Size: 143 KB - Last synced at: 18 days ago - Pushed at: about 1 month ago - Stars: 69 - Forks: 8

ko-ichi-h/khcoder

KH Coder: for Quantitative Content Analysis or Text Mining

Language: Perl - Size: 30.5 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 316 - Forks: 98

vinit714/Steam-Game-Review-Analysis-Using-NLP-and-Clustering

Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

AbrSantiago/corpus-tfidf-analyzer

A Python tool for text analysis using TF-IDF, lemmatization, stopword filtering, and frequency visualization.

Language: Python - Size: 14.6 KB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

cpsievert/LDAvis

R package for web-based interactive topic model visualization.

Language: JavaScript - Size: 24 MB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 559 - Forks: 132

kottoization/SentimentAnalysisOnConsumentOpinions

NLP, text mining sentiment analysis on consumer opinions, using BERT and 2 ML models

Language: Jupyter Notebook - Size: 1.02 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 1 - Forks: 0

opensemanticsearch/open-semantic-search

Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)

Language: Shell - Size: 8.91 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 1,019 - Forks: 180

NicholasMamo/multiplex-plot

Multiplex: visualizations that tell stories—A Python library to create and annotate beautiful network graph visualizations, text visualizations and more.

Language: Python - Size: 94.2 MB - Last synced at: 14 days ago - Pushed at: over 2 years ago - Stars: 111 - Forks: 15

chiphuyen/lazynlp

Library to scrape and clean web pages to create massive datasets.

Language: Python - Size: 37.1 KB - Last synced at: 4 days ago - Pushed at: over 4 years ago - Stars: 2,184 - Forks: 311

shangjingbo1226/AutoPhrase

AutoPhrase: Automated Phrase Mining from Massive Text Corpora

Language: C++ - Size: 195 MB - Last synced at: 2 days ago - Pushed at: over 3 years ago - Stars: 1,184 - Forks: 278

Inkdecker/Inktyping

Free tool for text exploration, analyze your favorites books and practice writing.

Language: Python - Size: 68.8 MB - Last synced at: 22 days ago - Pushed at: 23 days ago - Stars: 0 - Forks: 0

trinker/qdap

Quantitative Discourse Analysis Package: Bridging the gap between qualitative data and quantitative analysis

Language: R - Size: 36.9 MB - Last synced at: 6 days ago - Pushed at: over 4 years ago - Stars: 177 - Forks: 44

giocomai/castarter

Content Analysis Starter Toolkit for the R programming language

Language: R - Size: 1.25 MB - Last synced at: 4 days ago - Pushed at: 24 days ago - Stars: 3 - Forks: 0

seinecle/nocodefunctions-web-app

The code base of the front-end of nocodefunctions.com

Language: CSS - Size: 37.7 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 39 - Forks: 7

TiesdeKok/Python_NLP_Tutorial

This repository provides everything to get started with Python for Text Mining / Natural Language Processing (NLP)

Language: Jupyter Notebook - Size: 443 KB - Last synced at: 22 days ago - Pushed at: almost 5 years ago - Stars: 125 - Forks: 66

SentometricsResearch/sentometrics

An integrated framework in R for textual sentiment time series aggregation and prediction

Language: R - Size: 438 MB - Last synced at: 15 days ago - Pushed at: about 1 month ago - Stars: 84 - Forks: 22

HanXinzi-AI/awesome-python-machine-learning-resources

a collection of awesome machine learning and deep learning Python libraries&tools. 热门实用机器学习和深入学习Python库和工具的集合

Size: 11 MB - Last synced at: 5 days ago - Pushed at: 11 months ago - Stars: 166 - Forks: 25

dselivanov/text2vec

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.

Language: R - Size: 46.2 MB - Last synced at: 6 days ago - Pushed at: 9 months ago - Stars: 862 - Forks: 133

danielvartan/iramuteqlike

💬⛏️ IRaMuTeQ Software Analyses in R

Language: R - Size: 3.37 MB - Last synced at: 12 days ago - Pushed at: 10 months ago - Stars: 7 - Forks: 2

sergioburdisso/pyss3

A Python package implementing a new interpretable machine learning model for text classification (with visualization tools for Explainable AI :octocat:)

Language: Python - Size: 102 MB - Last synced at: about 20 hours ago - Pushed at: 4 months ago - Stars: 341 - Forks: 44

rosette-api/java

Babel Street Analytics Client Library for Java

Language: Java - Size: 64.8 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 11 - Forks: 35

bnosac/udpipe

R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit

Language: C++ - Size: 5.74 MB - Last synced at: 4 days ago - Pushed at: about 2 years ago - Stars: 214 - Forks: 33

massimoaria/tall

Text Analysis for aLL

Language: R - Size: 63.6 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 16 - Forks: 5

Lips7/Matcher

A high-performance matcher designed to solve LOGICAL and TEXT VARIATIONS problems in word matching, implemented in Rust.

Language: Rust - Size: 36.9 MB - Last synced at: 1 day ago - Pushed at: 27 days ago - Stars: 17 - Forks: 1

M-Serajian/MTB-Pipeline

MTB++ a software developed to predict antimicrobial resistance to 13 antibiotics and 3 families of antimicrobials.

Language: Python - Size: 16.3 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 2 - Forks: 1

sbkellogg/eci-588

ECI 588: Text Mining in Education is graduate-level course for preparing education researchers and practitioners to use text as data for understanding and improving teaching and learning contexts.

Language: HTML - Size: 93.2 MB - Last synced at: 28 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 0

cosmoduende/r-holy-books-sentiment-data-analysis

What's the most positive or negative religion? . Sentiment and Data Analysis of Holy Books with R. Analysis of religious dogmas by exploring their Holy Books (The Bible, The Quran, The Dhammapada, and The Book of Mormon) with R

Language: R - Size: 1.42 MB - Last synced at: 22 days ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 4

ingmarboeschen/JATSdecoder

A text extraction and manipulation toolset for NISO-JATS coded XML files

Language: R - Size: 2.94 MB - Last synced at: 28 days ago - Pushed at: 30 days ago - Stars: 19 - Forks: 1

PearlLeeCode/2024-us-election-analysis

[인공지능기초] 텍스트 마이닝을 통한 2024 미국 대선 분석 🗽

Language: Jupyter Notebook - Size: 67.3 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 1 - Forks: 0

Yingjie4Science/SDGdetector

A novel R package that can identify and visualize 17 Sustainable Development Goals and associated 169 Targets in text

Language: R - Size: 8.16 MB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 16 - Forks: 1

jbesomi/texthero

Text preprocessing, representation and visualization from zero to hero.

Language: Python - Size: 22.1 MB - Last synced at: 30 days ago - Pushed at: over 1 year ago - Stars: 2,904 - Forks: 240

nlppln/nlppln

NLP pipeline software using common workflow language

Language: Python - Size: 266 KB - Last synced at: 28 days ago - Pushed at: about 6 years ago - Stars: 33 - Forks: 3

caufieldjh/awesome-bioie

🧫 A curated list of resources relevant to doing Biomedical Information Extraction (including BioNLP)

Size: 588 KB - Last synced at: 12 days ago - Pushed at: 12 months ago - Stars: 371 - Forks: 33

apelullo/cobalt_health_wellness_platform_ops

Cobalt is a mental health and wellness platform created for Penn Medicine employees that serves as a hub for support services such as therapy, wellness coaching, topic- and population-specific group sessions, and a variety of self-help resources.

Language: Jupyter Notebook - Size: 194 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

juliasilge/tidy-text-mining

Manuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson

Language: TeX - Size: 84.8 MB - Last synced at: 29 days ago - Pushed at: about 1 month ago - Stars: 1,338 - Forks: 802

avrtt/MobileEAST

Lightweight and fast scene text detection based on EAST architecture and MobileNet layers

Size: 3.48 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 1

SmartDataAnalytics/HORUS-NER

HORUS: A framework to boost NLP tasks

Language: Python - Size: 949 MB - Last synced at: 14 days ago - Pushed at: almost 5 years ago - Stars: 48 - Forks: 6

AnttiHaerkoenen/laadulliset

Laadullisten aineistojen työmenetelmät historiatieteessä

Language: HTML - Size: 7.28 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

kavgan/nlp-in-practice

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

Language: Jupyter Notebook - Size: 91.8 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 1,168 - Forks: 792

climate-ip/radar-nlp-animal-rescue

Real-time Animal Danger Alert Recognition (RADAR): NLP pipeline to detect urgent animal rescue signals from social media posts.

Language: Jupyter Notebook - Size: 15.6 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

jmartinezheras/2018-MachineLearning-Lectures-ESA

Machine Learning Lectures at the European Space Agency (ESA) in 2018

Language: Jupyter Notebook - Size: 58.3 MB - Last synced at: 19 days ago - Pushed at: over 1 year ago - Stars: 358 - Forks: 147

maxent-ai/lda2vec 📦

Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019

Language: Jupyter Notebook - Size: 89.8 KB - Last synced at: about 1 month ago - Pushed at: about 6 years ago - Stars: 30 - Forks: 3

Related Keywords

text-mining 1,931 nlp 446 python 380 machine-learning 336 natural-language-processing 297 text-classification 245 r 233 sentiment-analysis 228 text-analysis 185 data-science 166 topic-modeling 152 text-processing 110 data-mining 107 nlp-machine-learning 91 deep-learning 88 nltk 72 python3 72 classification 69 data-visualization 63 information-retrieval 57 tf-idf 56 clustering 55 data-analysis 54 text 52 twitter 50 wordcloud 44 rstats 42 visualization 41 webscraping 40 word2vec 39 web-scraping 39 named-entity-recognition 37 spacy 37 lda 36 java 34 dataset 33 keyword-extraction 33 naive-bayes-classifier 30 information-extraction 30 artificial-intelligence 29 jupyter-notebook 29 logistic-regression 28 pandas 28 sentiment-classification 27 latent-dirichlet-allocation 27 twitter-api 26 scikit-learn 25 random-forest 23 bag-of-words 22 neural-network 22 ai 22 text-analytics 22 analysis 21 machine-learning-algorithms 21 tensorflow 21 word-embeddings 21 digital-humanities 21 neural-networks 20 gensim 20 regex 20 ner 20 tokenization 20 r-package 20 data 20 network-analysis 20 news 19 unsupervised-learning 19 bioinformatics 19 tidytext 19 corpus 18 search-engine 18 social-media 18 scraping 18 crawler 18 javascript 18 pubmed 18 lemmatization 17 summarization 17 covid-19 17 sentiment 17 sklearn 17 feature-extraction 16 numpy 16 pytorch 16 statistics 16 tokenizer 16 social-network-analysis 16 tweets 16 corpus-linguistics 16 tidyverse 15 keras 15 image-processing 15 twitter-sentiment-analysis 15 shiny 15 cosine-similarity 15 natural-language-understanding 15 bert 15 flask 15 exploratory-data-analysis 15 text-clustering 15