An open API service providing repository metadata for many open source software ecosystems.

Topic: "text-mining"

keon/awesome-nlp

:book: A curated list of resources dedicated to Natural Language Processing (NLP)

Size: 541 KB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 17,085 - Forks: 2,602

adbar/trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

Language: Python - Size: 33.8 MB - Last synced at: 9 days ago - Pushed at: about 1 month ago - Stars: 4,118 - Forks: 288

deanmalmgren/textract

extract text from any document. no muss. no fuss.

Language: HTML - Size: 4.31 MB - Last synced at: about 18 hours ago - Pushed at: 5 months ago - Stars: 4,073 - Forks: 624

jbesomi/texthero

Text preprocessing, representation and visualization from zero to hero.

Language: Python - Size: 22.1 MB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 2,904 - Forks: 240

JasonKessler/scattertext

Beautiful visualizations of how language differs among document types.

Language: Python - Size: 40.7 MB - Last synced at: 12 days ago - Pushed at: 7 months ago - Stars: 2,289 - Forks: 292

chiphuyen/lazynlp

Library to scrape and clean web pages to create massive datasets.

Language: Python - Size: 37.1 KB - Last synced at: about 23 hours ago - Pushed at: over 4 years ago - Stars: 2,184 - Forks: 311

ujjwalkarn/DataScienceR

a curated list of R tutorials for Data Science, NLP and Machine Learning

Language: R - Size: 15.7 MB - Last synced at: 5 days ago - Pushed at: about 2 years ago - Stars: 2,036 - Forks: 890

mathsyouth/awesome-text-summarization Fork of lipiji/App-DL

A curated list of resources dedicated to text summarization

Size: 243 KB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 1,542 - Forks: 266

konlpy/konlpy

Python package for Korean natural language processing.

Language: Python - Size: 34.9 MB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 1,416 - Forks: 333

juliasilge/tidy-text-mining

Manuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson

Language: TeX - Size: 84.8 MB - Last synced at: 9 days ago - Pushed at: 14 days ago - Stars: 1,338 - Forks: 802

juliasilge/tidytext

Text mining using tidy tools :sparkles::page_facing_up::sparkles:

Language: R - Size: 129 MB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 1,187 - Forks: 181

shangjingbo1226/AutoPhrase

AutoPhrase: Automated Phrase Mining from Massive Text Corpora

Language: C++ - Size: 195 MB - Last synced at: 6 months ago - Pushed at: about 3 years ago - Stars: 1,171 - Forks: 273

kavgan/nlp-in-practice

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

Language: Jupyter Notebook - Size: 91.8 MB - Last synced at: 12 days ago - Pushed at: over 4 years ago - Stars: 1,168 - Forks: 792

csurfer/rake-nltk

Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.

Language: Python - Size: 477 KB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 1,069 - Forks: 150

DemonDamon/FinnewsHunter

从新浪财经、每经网、金融界、中国证券网、证券时报网上,爬取上市公司(个股)的历史新闻文本数据进行文本分析、提取特征集,然后利用SVM、随机森林等分类器进行训练,最后对实施抓取的新闻数据进行分类预测

Language: Python - Size: 5.46 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1,033 - Forks: 273

opensemanticsearch/open-semantic-search

Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)

Language: Shell - Size: 8.91 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1,019 - Forks: 180

gsh199449/spider

A configurable web spider with a easy-to-use web console

Language: Java - Size: 14.4 MB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 987 - Forks: 484

nlptown/nlp-notebooks

A collection of notebooks for Natural Language Processing from NLP Town

Language: Jupyter Notebook - Size: 94.8 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 884 - Forks: 358

dselivanov/text2vec

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.

Language: R - Size: 46.2 MB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 862 - Forks: 133

bigartm/bigartm

Fast topic modeling platform

Language: C++ - Size: 16.8 MB - Last synced at: 13 days ago - Pushed at: over 1 year ago - Stars: 668 - Forks: 120

gesiscss/awesome-computational-social-science

A list of awesome resources for Computational Social Science

Language: R - Size: 209 KB - Last synced at: 6 days ago - Pushed at: 14 days ago - Stars: 664 - Forks: 84

graphbrain/graphbrain

Language, Knowledge, Cognition

Language: Python - Size: 103 MB - Last synced at: 13 days ago - Pushed at: about 2 months ago - Stars: 598 - Forks: 69

stepthom/text_mining_resources

Resources for learning about Text Mining and Natural Language Processing

Size: 707 KB - Last synced at: 9 days ago - Pushed at: about 2 years ago - Stars: 574 - Forks: 199

cpsievert/LDAvis

R package for web-based interactive topic model visualization.

Language: JavaScript - Size: 24 MB - Last synced at: 12 days ago - Pushed at: about 1 year ago - Stars: 558 - Forks: 132

laugustyniak/awesome-sentiment-analysis

Repository with all what is necessary for sentiment analysis and related areas

Size: 36.1 KB - Last synced at: 8 days ago - Pushed at: over 1 year ago - Stars: 539 - Forks: 110

stephenhky/PyShortTextCategorization

Various Algorithms for Short Text Mining

Language: Python - Size: 111 MB - Last synced at: 7 days ago - Pushed at: 10 days ago - Stars: 470 - Forks: 72

nishitpatel01/Fake_News_Detection

Fake News Detection in Python

Language: Jupyter Notebook - Size: 12.6 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 453 - Forks: 307

adbar/German-NLP

Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German

Size: 144 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 440 - Forks: 63

kk7nc/RMDL

RMDL: Random Multimodel Deep Learning for Classification

Language: Python - Size: 223 MB - Last synced at: 7 days ago - Pushed at: almost 2 years ago - Stars: 430 - Forks: 122

airbnb/artificial-adversary

🗣️ Tool to generate adversarial text examples and test machine learning models against them

Language: Python - Size: 116 KB - Last synced at: 16 days ago - Pushed at: over 3 years ago - Stars: 402 - Forks: 57

bakrianoo/aravec

AraVec is a pre-trained distributed word representation (word embedding) open source project which aims to provide the Arabic NLP research community with free to use and powerful word embedding models.

Language: Jupyter Notebook - Size: 1.17 MB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 398 - Forks: 80

caufieldjh/awesome-bioie

🧫 A curated list of resources relevant to doing Biomedical Information Extraction (including BioNLP)

Size: 588 KB - Last synced at: 4 days ago - Pushed at: 11 months ago - Stars: 371 - Forks: 33

jmartinezheras/2018-MachineLearning-Lectures-ESA

Machine Learning Lectures at the European Space Agency (ESA) in 2018

Language: Jupyter Notebook - Size: 58.3 MB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 353 - Forks: 145

sergioburdisso/pyss3

A Python package implementing a new interpretable machine learning model for text classification (with visualization tools for Explainable AI :octocat:)

Language: Python - Size: 102 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 342 - Forks: 44

lining0806/TextMining

Python文本挖掘系统 Research of Text Mining System

Language: Python - Size: 3.79 MB - Last synced at: 13 days ago - Pushed at: about 7 years ago - Stars: 341 - Forks: 154

hiDaDeng/cntext

文本分析包,支持字数统计、可读性、文档相似度、情感分析在内的多种文本分析方法。Text analysis, supporting multiple methods including word count, readability, document similarity, sentiment analysis, Word2Vec/GloVe, and Large Language Models (LLMs).

Language: Python - Size: 63.5 MB - Last synced at: 8 days ago - Pushed at: 14 days ago - Stars: 329 - Forks: 30

mcs07/ChemDataExtractor

Automatically extract chemical information from scientific documents

Language: Python - Size: 542 KB - Last synced at: 16 days ago - Pushed at: over 1 year ago - Stars: 322 - Forks: 116

jalajthanaki/NLPython

This repository contains the code related to Natural Language Processing using python scripting language. All the codes are related to my book entitled "Python Natural Language Processing"

Language: Jupyter Notebook - Size: 131 MB - Last synced at: 15 days ago - Pushed at: over 2 years ago - Stars: 322 - Forks: 207

ropensci-archive/rplos 📦

:warning: ARCHIVED :warning: R client for the PLoS Journals API

Language: R - Size: 4.82 MB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 316 - Forks: 107

ko-ichi-h/khcoder

KH Coder: for Quantitative Content Analysis or Text Mining

Language: Perl - Size: 30.5 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 315 - Forks: 98

mesolitica/malaysian-dataset

We gather Malaysian dataset! https://malaysian-dataset.readthedocs.io/

Language: Jupyter Notebook - Size: 1.36 GB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 312 - Forks: 111

kk7nc/HDLTex

HDLTex: Hierarchical Deep Learning for Text Classification

Language: Python - Size: 32 MB - Last synced at: 12 days ago - Pushed at: 6 months ago - Stars: 272 - Forks: 65

vgrabovets/multi_rake

Multilingual Rapid Automatic Keyword Extraction (RAKE) for Python

Language: Python - Size: 123 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 270 - Forks: 37

dataqa/nlp-labelling

Labelling platform for text using weak supervision.

Language: JavaScript - Size: 4.97 MB - Last synced at: 26 days ago - Pushed at: almost 3 years ago - Stars: 260 - Forks: 18

blueprints-for-text-analytics-python/blueprints-text

Jupyter notebooks for our O'Reilly book "Blueprints for Text Analysis Using Python"

Language: Jupyter Notebook - Size: 164 MB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 254 - Forks: 146

oroszgy/awesome-hungarian-nlp

A curated list of NLP resources for Hungarian

Size: 112 KB - Last synced at: 8 days ago - Pushed at: about 2 months ago - Stars: 244 - Forks: 18

neomatrix369/nlp_profiler

A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.

Language: Python - Size: 3.54 MB - Last synced at: 15 days ago - Pushed at: 11 months ago - Stars: 242 - Forks: 37

jphall663/GWU_data_mining

Materials for GWU DNSC 6279 and DNSC 6290.

Language: Jupyter Notebook - Size: 186 MB - Last synced at: 15 days ago - Pushed at: 10 months ago - Stars: 238 - Forks: 173

RandyPen/TextCluster

短文本聚类预处理模块 Short text cluster

Language: Python - Size: 1.25 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 238 - Forks: 60

qminer/qminer

Analytic platform for real-time large-scale streams containing structured and unstructured data.

Language: C++ - Size: 39.6 MB - Last synced at: 8 months ago - Pushed at: about 2 years ago - Stars: 218 - Forks: 57

bnosac/udpipe

R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit

Language: C++ - Size: 5.74 MB - Last synced at: 16 days ago - Pushed at: about 2 years ago - Stars: 214 - Forks: 33

bookieio/breadability

Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)

Language: HTML - Size: 604 KB - Last synced at: 12 days ago - Pushed at: 12 months ago - Stars: 204 - Forks: 25

giacbrd/ShallowLearn

An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.

Language: Python - Size: 537 KB - Last synced at: 20 days ago - Pushed at: over 7 years ago - Stars: 198 - Forks: 29

ropensci/tokenizers

Fast, Consistent Tokenization of Natural Language Text

Language: R - Size: 1.24 MB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 186 - Forks: 25

currentslab/extractnet

A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package

Language: HTML - Size: 421 MB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 181 - Forks: 19

maxent-ai/converse

Conversational text Analysis using various NLP techniques

Language: Jupyter Notebook - Size: 154 KB - Last synced at: 20 days ago - Pushed at: almost 2 years ago - Stars: 181 - Forks: 19

trinker/qdap

Quantitative Discourse Analysis Package: Bridging the gap between qualitative data and quantitative analysis

Language: R - Size: 36.9 MB - Last synced at: 6 months ago - Pushed at: over 4 years ago - Stars: 175 - Forks: 44

luozhouyang/AutoPhraseX

Automated Phrase Mining from Massive Text Corpora in Python.

Language: Python - Size: 90.8 KB - Last synced at: 11 days ago - Pushed at: almost 4 years ago - Stars: 171 - Forks: 37

bijoyandas/Hands-On-Natural-Language-Processing-with-Python

This repository is for my students of Udemy. You can find all lecture codes along with mentioned files for reading in here. So, feel free to clone it and if you have any problem just raise a question.

Language: Python - Size: 9.73 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 171 - Forks: 238

fendouai/Awesome-Text-Classification

Awesome-Text-Classification Projects,Papers,Tutorial .

Size: 7.81 KB - Last synced at: 11 days ago - Pushed at: over 7 years ago - Stars: 171 - Forks: 32

karolzak/support-tickets-classification

This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en

Language: Python - Size: 3.74 MB - Last synced at: 12 days ago - Pushed at: almost 3 years ago - Stars: 168 - Forks: 91

mkearney/textfeatures

👷‍♂️ A simple package for extracting useful features from character objects 👷‍♀️

Language: R - Size: 7.64 MB - Last synced at: 11 days ago - Pushed at: over 4 years ago - Stars: 167 - Forks: 17

huspacy/huspacy

HuSpaCy: industrial-strength Hungarian natural language processing

Language: Python - Size: 2.2 MB - Last synced at: 7 days ago - Pushed at: 6 months ago - Stars: 165 - Forks: 15

HanXinzi-AI/awesome-python-machine-learning-resources

a collection of awesome machine learning and deep learning Python libraries&tools. 热门实用机器学习和深入学习Python库和工具的集合

Size: 11 MB - Last synced at: 10 days ago - Pushed at: 11 months ago - Stars: 163 - Forks: 25

assafmo/xioc

Extract indicators of compromise from text, including "escaped" ones.

Language: Go - Size: 64.5 KB - Last synced at: 13 days ago - Pushed at: about 5 years ago - Stars: 159 - Forks: 13

jakelever/kindred

A Python biomedical relation extraction package that uses a supervised approach (i.e. needs training data).

Language: Python - Size: 2.38 MB - Last synced at: 15 days ago - Pushed at: about 2 years ago - Stars: 157 - Forks: 30

EmilHvitfeldt/R-text-data

List of textual data sources to be used for text mining in R

Size: 9.71 MB - Last synced at: 13 days ago - Pushed at: over 3 years ago - Stars: 147 - Forks: 15

brandonrobertz/SparseLSH

A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.

Language: Python - Size: 108 KB - Last synced at: 14 days ago - Pushed at: 8 months ago - Stars: 146 - Forks: 27

mesejo/trrex

Efficient string matching with regular expressions

Language: Python - Size: 443 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 143 - Forks: 6

hugochan/KATE

Code & data accompanying the KDD 2017 paper "KATE: K-Competitive Autoencoder for Text"

Language: Python - Size: 4.8 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 143 - Forks: 49

Planeshifter/text-miner

text mining utilities for Node.js

Language: JavaScript - Size: 245 KB - Last synced at: 11 days ago - Pushed at: about 2 years ago - Stars: 141 - Forks: 19

pemagrg1/Natural-Language-Processing-NLP-Roadmap

A simple RoadMap to Natural Language Processing(NLP)

Size: 48.8 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 134 - Forks: 17

biolab/orange3-text

🍊 :page_facing_up: Text Mining add-on for Orange3

Language: Python - Size: 46.5 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 131 - Forks: 88

Lilykos/pyphonetics

A Python 3 phonetics library.

Language: Python - Size: 21.5 KB - Last synced at: 12 days ago - Pushed at: about 5 years ago - Stars: 129 - Forks: 20

TiesdeKok/Python_NLP_Tutorial

This repository provides everything to get started with Python for Text Mining / Natural Language Processing (NLP)

Language: Jupyter Notebook - Size: 443 KB - Last synced at: 1 day ago - Pushed at: almost 5 years ago - Stars: 125 - Forks: 66

dperezrada/keywords2vec

Language: Jupyter Notebook - Size: 1.15 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 123 - Forks: 15

aphp/edsnlp

Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes.

Language: Python - Size: 116 MB - Last synced at: 8 days ago - Pushed at: 13 days ago - Stars: 122 - Forks: 31

JosiahParry/genius

Easily access song lyrics from Genius in a tibble.

Language: HTML - Size: 428 KB - Last synced at: 5 days ago - Pushed at: over 3 years ago - Stars: 121 - Forks: 18

dipanjanS/learning-social-media-analytics-with-r

This repository contains code and bonus content which will be added from time to time for the book "Learning Social Media Analytics with R" by Packt

Language: R - Size: 2.55 MB - Last synced at: 5 days ago - Pushed at: almost 8 years ago - Stars: 119 - Forks: 80

CogComp/cogcomp-nlpy

CogComp's light-weight Python NLP annotators

Language: Python - Size: 331 KB - Last synced at: 10 months ago - Pushed at: about 6 years ago - Stars: 116 - Forks: 26

YaleDHLab/intertext 📦

Detect and visualize text reuse

Language: Python - Size: 3.11 MB - Last synced at: 5 months ago - Pushed at: 8 months ago - Stars: 115 - Forks: 10

gsurma/text_predictor

Char-level RNN LSTM text generator📄.

Language: Python - Size: 125 MB - Last synced at: 19 days ago - Pushed at: almost 4 years ago - Stars: 115 - Forks: 35

trinker/lexicon

A data package containing lexicons and dictionaries for text analysis

Language: R - Size: 9.17 MB - Last synced at: 11 days ago - Pushed at: over 3 years ago - Stars: 110 - Forks: 14

NicholasMamo/multiplex-plot

Multiplex: visualizations that tell stories—A Python library to create and annotate beautiful network graph visualizations, text visualizations and more.

Language: Python - Size: 94.2 MB - Last synced at: 6 months ago - Pushed at: over 2 years ago - Stars: 109 - Forks: 16

DmitryRyumin/EMNLP-2023-Papers

EMNLP 2023 Papers: Explore cutting-edge research from EMNLP 2023, the premier conference for advancing empirical methods in natural language processing. Stay updated on the latest in machine learning, deep learning, and natural language processing with code included. :star: support NLP!

Language: Python - Size: 6.43 MB - Last synced at: 8 days ago - Pushed at: 11 months ago - Stars: 107 - Forks: 7

lisc-tools/lisc

Literature Scanner: Automated collection & analyses of the scientific literature.

Language: Python - Size: 6.84 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 106 - Forks: 12

lettier/lda-topic-modeling

A PureScript, browser-based implementation of LDA topic modeling.

Language: PureScript - Size: 109 KB - Last synced at: 1 day ago - Pushed at: about 7 years ago - Stars: 104 - Forks: 17

bnosac/ruimtehol

R package to Embed All the Things! using StarSpace

Language: C++ - Size: 39.3 MB - Last synced at: 13 days ago - Pushed at: about 1 year ago - Stars: 101 - Forks: 13

Jasonnor/tf-idf-python

Term frequency–inverse document frequency for Chinese novel/documents implemented in python.

Language: Python - Size: 14.5 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 99 - Forks: 36

juliasilge/janeaustenr

An R Package for Jane Austen's Complete Novels :orange_book:

Language: R - Size: 4.78 MB - Last synced at: 14 days ago - Pushed at: over 2 years ago - Stars: 95 - Forks: 22

luopeixiang/awesome-text-summarization

Text summarization starting from scratch.

Size: 5.86 KB - Last synced at: 10 days ago - Pushed at: 8 months ago - Stars: 94 - Forks: 17

jonaschn/awesome-topic-models

✨ Awesome - A curated list of amazing Topic Models (implementations, libraries, and resources)

Size: 53.7 KB - Last synced at: 9 days ago - Pushed at: almost 3 years ago - Stars: 93 - Forks: 7

fingeredman/teanaps

자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.

Language: Jupyter Notebook - Size: 62.5 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 92 - Forks: 11

SentometricsResearch/sentometrics

An integrated framework in R for textual sentiment time series aggregation and prediction

Language: R - Size: 438 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 83 - Forks: 22

NFeruch/reddit2text

The Python toolkit for converting Reddit threads into organized text data. Extract and process Reddit content with ease!

Language: Python - Size: 49.8 KB - Last synced at: 4 months ago - Pushed at: 8 months ago - Stars: 83 - Forks: 7

hiDaDeng/DaDengAndHisPython

【微信公众号:大邓和他的python】, Python语法快速入门https://www.bilibili.com/video/av44384851 Python网络爬虫快速入门https://www.bilibili.com/video/av72010301, 我的联系邮箱[email protected]

Language: Jupyter Notebook - Size: 493 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 83 - Forks: 52

PacktPublishing/Hands-On-Python-Natural-Language-Processing

Language: Jupyter Notebook - Size: 116 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 82 - Forks: 77

docwire/docwire

DocWire SDK: Award-winning modern data processing in C++20. SourceForge Community Choice & Microsoft support. AI-driven processing. Supports nearly 100 data formats, including email boxes and OCR. Boost efficiency in text extraction, web data extraction, data mining, document analysis. Offline processing is possible for security and confidentiality

Language: C++ - Size: 35.8 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 81 - Forks: 18

ziqizhang/jate

NEWS: JATE2.0 Beta.11 Released, see details below.

Language: Java - Size: 286 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 81 - Forks: 29

AllenDang/PipeIt

PipeIt is a text transformation, conversion, cleansing and extraction tool.

Language: Go - Size: 349 KB - Last synced at: 6 days ago - Pushed at: over 3 years ago - Stars: 80 - Forks: 6

Related Topics
nlp 443 python 377 machine-learning 333 natural-language-processing 296 text-classification 245 r 233 sentiment-analysis 226 text-analysis 185 data-science 165 topic-modeling 151 text-processing 110 data-mining 107 nlp-machine-learning 91 deep-learning 88 python3 72 nltk 71 classification 69 data-visualization 63 information-retrieval 57 data-analysis 54 clustering 54 tf-idf 54 text 52 twitter 50 wordcloud 44 rstats 42 visualization 41 webscraping 40 web-scraping 39 word2vec 38 named-entity-recognition 37 spacy 37 lda 36 java 34 keyword-extraction 33 dataset 33 naive-bayes-classifier 30 information-extraction 30 artificial-intelligence 29 jupyter-notebook 29 pandas 28 logistic-regression 28 sentiment-classification 27 latent-dirichlet-allocation 26 twitter-api 26 scikit-learn 24 text-analytics 23 ai 22 bag-of-words 22 random-forest 22 neural-network 22 analysis 21 tensorflow 21 word-embeddings 21 machine-learning-algorithms 21 tokenization 20 r-package 20 regex 20 digital-humanities 20 neural-networks 20 data 20 network-analysis 20 gensim 20 ner 20 tidytext 19 news 19 bioinformatics 19 unsupervised-learning 19 crawler 18 pubmed 18 scraping 18 social-media 18 corpus 18 sentiment 18 javascript 18 summarization 17 search-engine 17 covid-19 17 sklearn 17 social-network-analysis 16 numpy 16 statistics 16 pytorch 16 corpus-linguistics 16 tokenizer 16 feature-extraction 16 tweets 16 text-summarization 15 lemmatization 15 exploratory-data-analysis 15 embeddings 15 cosine-similarity 15 twitter-sentiment-analysis 15 keras 15 text-extraction 15 shiny 15 naive-bayes 15 image-processing 15 text-clustering 15 flask 15