Topic: "text-mining"
keon/awesome-nlp
:book: A curated list of resources dedicated to Natural Language Processing (NLP)
Size: 541 KB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 17,085 - Forks: 2,602

adbar/trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Language: Python - Size: 33.8 MB - Last synced at: 9 days ago - Pushed at: about 1 month ago - Stars: 4,118 - Forks: 288

deanmalmgren/textract
extract text from any document. no muss. no fuss.
Language: HTML - Size: 4.31 MB - Last synced at: about 18 hours ago - Pushed at: 5 months ago - Stars: 4,073 - Forks: 624

jbesomi/texthero
Text preprocessing, representation and visualization from zero to hero.
Language: Python - Size: 22.1 MB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 2,904 - Forks: 240

JasonKessler/scattertext
Beautiful visualizations of how language differs among document types.
Language: Python - Size: 40.7 MB - Last synced at: 12 days ago - Pushed at: 7 months ago - Stars: 2,289 - Forks: 292

chiphuyen/lazynlp
Library to scrape and clean web pages to create massive datasets.
Language: Python - Size: 37.1 KB - Last synced at: about 23 hours ago - Pushed at: over 4 years ago - Stars: 2,184 - Forks: 311

ujjwalkarn/DataScienceR
a curated list of R tutorials for Data Science, NLP and Machine Learning
Language: R - Size: 15.7 MB - Last synced at: 5 days ago - Pushed at: about 2 years ago - Stars: 2,036 - Forks: 890

mathsyouth/awesome-text-summarization Fork of lipiji/App-DL
A curated list of resources dedicated to text summarization
Size: 243 KB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 1,542 - Forks: 266

konlpy/konlpy
Python package for Korean natural language processing.
Language: Python - Size: 34.9 MB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 1,416 - Forks: 333

juliasilge/tidy-text-mining
Manuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson
Language: TeX - Size: 84.8 MB - Last synced at: 9 days ago - Pushed at: 14 days ago - Stars: 1,338 - Forks: 802

juliasilge/tidytext
Text mining using tidy tools :sparkles::page_facing_up::sparkles:
Language: R - Size: 129 MB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 1,187 - Forks: 181

shangjingbo1226/AutoPhrase
AutoPhrase: Automated Phrase Mining from Massive Text Corpora
Language: C++ - Size: 195 MB - Last synced at: 6 months ago - Pushed at: about 3 years ago - Stars: 1,171 - Forks: 273

kavgan/nlp-in-practice
Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
Language: Jupyter Notebook - Size: 91.8 MB - Last synced at: 12 days ago - Pushed at: over 4 years ago - Stars: 1,168 - Forks: 792

csurfer/rake-nltk
Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.
Language: Python - Size: 477 KB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 1,069 - Forks: 150

DemonDamon/FinnewsHunter
从新浪财经、每经网、金融界、中国证券网、证券时报网上,爬取上市公司(个股)的历史新闻文本数据进行文本分析、提取特征集,然后利用SVM、随机森林等分类器进行训练,最后对实施抓取的新闻数据进行分类预测
Language: Python - Size: 5.46 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1,033 - Forks: 273

opensemanticsearch/open-semantic-search
Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
Language: Shell - Size: 8.91 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1,019 - Forks: 180

gsh199449/spider
A configurable web spider with a easy-to-use web console
Language: Java - Size: 14.4 MB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 987 - Forks: 484

nlptown/nlp-notebooks
A collection of notebooks for Natural Language Processing from NLP Town
Language: Jupyter Notebook - Size: 94.8 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 884 - Forks: 358

dselivanov/text2vec
Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
Language: R - Size: 46.2 MB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 862 - Forks: 133

bigartm/bigartm
Fast topic modeling platform
Language: C++ - Size: 16.8 MB - Last synced at: 13 days ago - Pushed at: over 1 year ago - Stars: 668 - Forks: 120

gesiscss/awesome-computational-social-science
A list of awesome resources for Computational Social Science
Language: R - Size: 209 KB - Last synced at: 6 days ago - Pushed at: 14 days ago - Stars: 664 - Forks: 84

graphbrain/graphbrain
Language, Knowledge, Cognition
Language: Python - Size: 103 MB - Last synced at: 13 days ago - Pushed at: about 2 months ago - Stars: 598 - Forks: 69

stepthom/text_mining_resources
Resources for learning about Text Mining and Natural Language Processing
Size: 707 KB - Last synced at: 9 days ago - Pushed at: about 2 years ago - Stars: 574 - Forks: 199

cpsievert/LDAvis
R package for web-based interactive topic model visualization.
Language: JavaScript - Size: 24 MB - Last synced at: 12 days ago - Pushed at: about 1 year ago - Stars: 558 - Forks: 132

laugustyniak/awesome-sentiment-analysis
Repository with all what is necessary for sentiment analysis and related areas
Size: 36.1 KB - Last synced at: 8 days ago - Pushed at: over 1 year ago - Stars: 539 - Forks: 110

stephenhky/PyShortTextCategorization
Various Algorithms for Short Text Mining
Language: Python - Size: 111 MB - Last synced at: 7 days ago - Pushed at: 10 days ago - Stars: 470 - Forks: 72

nishitpatel01/Fake_News_Detection
Fake News Detection in Python
Language: Jupyter Notebook - Size: 12.6 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 453 - Forks: 307

adbar/German-NLP
Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German
Size: 144 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 440 - Forks: 63

kk7nc/RMDL
RMDL: Random Multimodel Deep Learning for Classification
Language: Python - Size: 223 MB - Last synced at: 7 days ago - Pushed at: almost 2 years ago - Stars: 430 - Forks: 122

airbnb/artificial-adversary
🗣️ Tool to generate adversarial text examples and test machine learning models against them
Language: Python - Size: 116 KB - Last synced at: 16 days ago - Pushed at: over 3 years ago - Stars: 402 - Forks: 57

bakrianoo/aravec
AraVec is a pre-trained distributed word representation (word embedding) open source project which aims to provide the Arabic NLP research community with free to use and powerful word embedding models.
Language: Jupyter Notebook - Size: 1.17 MB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 398 - Forks: 80

caufieldjh/awesome-bioie
🧫 A curated list of resources relevant to doing Biomedical Information Extraction (including BioNLP)
Size: 588 KB - Last synced at: 4 days ago - Pushed at: 11 months ago - Stars: 371 - Forks: 33

jmartinezheras/2018-MachineLearning-Lectures-ESA
Machine Learning Lectures at the European Space Agency (ESA) in 2018
Language: Jupyter Notebook - Size: 58.3 MB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 353 - Forks: 145

sergioburdisso/pyss3
A Python package implementing a new interpretable machine learning model for text classification (with visualization tools for Explainable AI :octocat:)
Language: Python - Size: 102 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 342 - Forks: 44

lining0806/TextMining
Python文本挖掘系统 Research of Text Mining System
Language: Python - Size: 3.79 MB - Last synced at: 13 days ago - Pushed at: about 7 years ago - Stars: 341 - Forks: 154

hiDaDeng/cntext
文本分析包,支持字数统计、可读性、文档相似度、情感分析在内的多种文本分析方法。Text analysis, supporting multiple methods including word count, readability, document similarity, sentiment analysis, Word2Vec/GloVe, and Large Language Models (LLMs).
Language: Python - Size: 63.5 MB - Last synced at: 8 days ago - Pushed at: 14 days ago - Stars: 329 - Forks: 30

mcs07/ChemDataExtractor
Automatically extract chemical information from scientific documents
Language: Python - Size: 542 KB - Last synced at: 16 days ago - Pushed at: over 1 year ago - Stars: 322 - Forks: 116

jalajthanaki/NLPython
This repository contains the code related to Natural Language Processing using python scripting language. All the codes are related to my book entitled "Python Natural Language Processing"
Language: Jupyter Notebook - Size: 131 MB - Last synced at: 15 days ago - Pushed at: over 2 years ago - Stars: 322 - Forks: 207

ropensci-archive/rplos 📦
:warning: ARCHIVED :warning: R client for the PLoS Journals API
Language: R - Size: 4.82 MB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 316 - Forks: 107

ko-ichi-h/khcoder
KH Coder: for Quantitative Content Analysis or Text Mining
Language: Perl - Size: 30.5 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 315 - Forks: 98

mesolitica/malaysian-dataset
We gather Malaysian dataset! https://malaysian-dataset.readthedocs.io/
Language: Jupyter Notebook - Size: 1.36 GB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 312 - Forks: 111

kk7nc/HDLTex
HDLTex: Hierarchical Deep Learning for Text Classification
Language: Python - Size: 32 MB - Last synced at: 12 days ago - Pushed at: 6 months ago - Stars: 272 - Forks: 65

vgrabovets/multi_rake
Multilingual Rapid Automatic Keyword Extraction (RAKE) for Python
Language: Python - Size: 123 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 270 - Forks: 37

dataqa/nlp-labelling
Labelling platform for text using weak supervision.
Language: JavaScript - Size: 4.97 MB - Last synced at: 26 days ago - Pushed at: almost 3 years ago - Stars: 260 - Forks: 18

blueprints-for-text-analytics-python/blueprints-text
Jupyter notebooks for our O'Reilly book "Blueprints for Text Analysis Using Python"
Language: Jupyter Notebook - Size: 164 MB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 254 - Forks: 146

oroszgy/awesome-hungarian-nlp
A curated list of NLP resources for Hungarian
Size: 112 KB - Last synced at: 8 days ago - Pushed at: about 2 months ago - Stars: 244 - Forks: 18

neomatrix369/nlp_profiler
A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.
Language: Python - Size: 3.54 MB - Last synced at: 15 days ago - Pushed at: 11 months ago - Stars: 242 - Forks: 37

jphall663/GWU_data_mining
Materials for GWU DNSC 6279 and DNSC 6290.
Language: Jupyter Notebook - Size: 186 MB - Last synced at: 15 days ago - Pushed at: 10 months ago - Stars: 238 - Forks: 173

RandyPen/TextCluster
短文本聚类预处理模块 Short text cluster
Language: Python - Size: 1.25 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 238 - Forks: 60

qminer/qminer
Analytic platform for real-time large-scale streams containing structured and unstructured data.
Language: C++ - Size: 39.6 MB - Last synced at: 8 months ago - Pushed at: about 2 years ago - Stars: 218 - Forks: 57

bnosac/udpipe
R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
Language: C++ - Size: 5.74 MB - Last synced at: 16 days ago - Pushed at: about 2 years ago - Stars: 214 - Forks: 33

bookieio/breadability
Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)
Language: HTML - Size: 604 KB - Last synced at: 12 days ago - Pushed at: 12 months ago - Stars: 204 - Forks: 25

giacbrd/ShallowLearn
An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.
Language: Python - Size: 537 KB - Last synced at: 20 days ago - Pushed at: over 7 years ago - Stars: 198 - Forks: 29

ropensci/tokenizers
Fast, Consistent Tokenization of Natural Language Text
Language: R - Size: 1.24 MB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 186 - Forks: 25

currentslab/extractnet
A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package
Language: HTML - Size: 421 MB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 181 - Forks: 19

maxent-ai/converse
Conversational text Analysis using various NLP techniques
Language: Jupyter Notebook - Size: 154 KB - Last synced at: 20 days ago - Pushed at: almost 2 years ago - Stars: 181 - Forks: 19

trinker/qdap
Quantitative Discourse Analysis Package: Bridging the gap between qualitative data and quantitative analysis
Language: R - Size: 36.9 MB - Last synced at: 6 months ago - Pushed at: over 4 years ago - Stars: 175 - Forks: 44

luozhouyang/AutoPhraseX
Automated Phrase Mining from Massive Text Corpora in Python.
Language: Python - Size: 90.8 KB - Last synced at: 11 days ago - Pushed at: almost 4 years ago - Stars: 171 - Forks: 37

bijoyandas/Hands-On-Natural-Language-Processing-with-Python
This repository is for my students of Udemy. You can find all lecture codes along with mentioned files for reading in here. So, feel free to clone it and if you have any problem just raise a question.
Language: Python - Size: 9.73 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 171 - Forks: 238

fendouai/Awesome-Text-Classification
Awesome-Text-Classification Projects,Papers,Tutorial .
Size: 7.81 KB - Last synced at: 11 days ago - Pushed at: over 7 years ago - Stars: 171 - Forks: 32

karolzak/support-tickets-classification
This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
Language: Python - Size: 3.74 MB - Last synced at: 12 days ago - Pushed at: almost 3 years ago - Stars: 168 - Forks: 91

mkearney/textfeatures
👷♂️ A simple package for extracting useful features from character objects 👷♀️
Language: R - Size: 7.64 MB - Last synced at: 11 days ago - Pushed at: over 4 years ago - Stars: 167 - Forks: 17

huspacy/huspacy
HuSpaCy: industrial-strength Hungarian natural language processing
Language: Python - Size: 2.2 MB - Last synced at: 7 days ago - Pushed at: 6 months ago - Stars: 165 - Forks: 15

HanXinzi-AI/awesome-python-machine-learning-resources
a collection of awesome machine learning and deep learning Python libraries&tools. 热门实用机器学习和深入学习Python库和工具的集合
Size: 11 MB - Last synced at: 10 days ago - Pushed at: 11 months ago - Stars: 163 - Forks: 25

assafmo/xioc
Extract indicators of compromise from text, including "escaped" ones.
Language: Go - Size: 64.5 KB - Last synced at: 13 days ago - Pushed at: about 5 years ago - Stars: 159 - Forks: 13

jakelever/kindred
A Python biomedical relation extraction package that uses a supervised approach (i.e. needs training data).
Language: Python - Size: 2.38 MB - Last synced at: 15 days ago - Pushed at: about 2 years ago - Stars: 157 - Forks: 30

EmilHvitfeldt/R-text-data
List of textual data sources to be used for text mining in R
Size: 9.71 MB - Last synced at: 13 days ago - Pushed at: over 3 years ago - Stars: 147 - Forks: 15

brandonrobertz/SparseLSH
A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.
Language: Python - Size: 108 KB - Last synced at: 14 days ago - Pushed at: 8 months ago - Stars: 146 - Forks: 27

mesejo/trrex
Efficient string matching with regular expressions
Language: Python - Size: 443 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 143 - Forks: 6

hugochan/KATE
Code & data accompanying the KDD 2017 paper "KATE: K-Competitive Autoencoder for Text"
Language: Python - Size: 4.8 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 143 - Forks: 49

Planeshifter/text-miner
text mining utilities for Node.js
Language: JavaScript - Size: 245 KB - Last synced at: 11 days ago - Pushed at: about 2 years ago - Stars: 141 - Forks: 19

pemagrg1/Natural-Language-Processing-NLP-Roadmap
A simple RoadMap to Natural Language Processing(NLP)
Size: 48.8 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 134 - Forks: 17

biolab/orange3-text
🍊 :page_facing_up: Text Mining add-on for Orange3
Language: Python - Size: 46.5 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 131 - Forks: 88

Lilykos/pyphonetics
A Python 3 phonetics library.
Language: Python - Size: 21.5 KB - Last synced at: 12 days ago - Pushed at: about 5 years ago - Stars: 129 - Forks: 20

TiesdeKok/Python_NLP_Tutorial
This repository provides everything to get started with Python for Text Mining / Natural Language Processing (NLP)
Language: Jupyter Notebook - Size: 443 KB - Last synced at: 1 day ago - Pushed at: almost 5 years ago - Stars: 125 - Forks: 66

dperezrada/keywords2vec
Language: Jupyter Notebook - Size: 1.15 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 123 - Forks: 15

aphp/edsnlp
Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes.
Language: Python - Size: 116 MB - Last synced at: 8 days ago - Pushed at: 13 days ago - Stars: 122 - Forks: 31

JosiahParry/genius
Easily access song lyrics from Genius in a tibble.
Language: HTML - Size: 428 KB - Last synced at: 5 days ago - Pushed at: over 3 years ago - Stars: 121 - Forks: 18

dipanjanS/learning-social-media-analytics-with-r
This repository contains code and bonus content which will be added from time to time for the book "Learning Social Media Analytics with R" by Packt
Language: R - Size: 2.55 MB - Last synced at: 5 days ago - Pushed at: almost 8 years ago - Stars: 119 - Forks: 80

CogComp/cogcomp-nlpy
CogComp's light-weight Python NLP annotators
Language: Python - Size: 331 KB - Last synced at: 10 months ago - Pushed at: about 6 years ago - Stars: 116 - Forks: 26

YaleDHLab/intertext 📦
Detect and visualize text reuse
Language: Python - Size: 3.11 MB - Last synced at: 5 months ago - Pushed at: 8 months ago - Stars: 115 - Forks: 10

gsurma/text_predictor
Char-level RNN LSTM text generator📄.
Language: Python - Size: 125 MB - Last synced at: 19 days ago - Pushed at: almost 4 years ago - Stars: 115 - Forks: 35

trinker/lexicon
A data package containing lexicons and dictionaries for text analysis
Language: R - Size: 9.17 MB - Last synced at: 11 days ago - Pushed at: over 3 years ago - Stars: 110 - Forks: 14

NicholasMamo/multiplex-plot
Multiplex: visualizations that tell stories—A Python library to create and annotate beautiful network graph visualizations, text visualizations and more.
Language: Python - Size: 94.2 MB - Last synced at: 6 months ago - Pushed at: over 2 years ago - Stars: 109 - Forks: 16

DmitryRyumin/EMNLP-2023-Papers
EMNLP 2023 Papers: Explore cutting-edge research from EMNLP 2023, the premier conference for advancing empirical methods in natural language processing. Stay updated on the latest in machine learning, deep learning, and natural language processing with code included. :star: support NLP!
Language: Python - Size: 6.43 MB - Last synced at: 8 days ago - Pushed at: 11 months ago - Stars: 107 - Forks: 7

lisc-tools/lisc
Literature Scanner: Automated collection & analyses of the scientific literature.
Language: Python - Size: 6.84 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 106 - Forks: 12

lettier/lda-topic-modeling
A PureScript, browser-based implementation of LDA topic modeling.
Language: PureScript - Size: 109 KB - Last synced at: 1 day ago - Pushed at: about 7 years ago - Stars: 104 - Forks: 17

bnosac/ruimtehol
R package to Embed All the Things! using StarSpace
Language: C++ - Size: 39.3 MB - Last synced at: 13 days ago - Pushed at: about 1 year ago - Stars: 101 - Forks: 13

Jasonnor/tf-idf-python
Term frequency–inverse document frequency for Chinese novel/documents implemented in python.
Language: Python - Size: 14.5 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 99 - Forks: 36

juliasilge/janeaustenr
An R Package for Jane Austen's Complete Novels :orange_book:
Language: R - Size: 4.78 MB - Last synced at: 14 days ago - Pushed at: over 2 years ago - Stars: 95 - Forks: 22

luopeixiang/awesome-text-summarization
Text summarization starting from scratch.
Size: 5.86 KB - Last synced at: 10 days ago - Pushed at: 8 months ago - Stars: 94 - Forks: 17

jonaschn/awesome-topic-models
✨ Awesome - A curated list of amazing Topic Models (implementations, libraries, and resources)
Size: 53.7 KB - Last synced at: 9 days ago - Pushed at: almost 3 years ago - Stars: 93 - Forks: 7

fingeredman/teanaps
자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.
Language: Jupyter Notebook - Size: 62.5 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 92 - Forks: 11

SentometricsResearch/sentometrics
An integrated framework in R for textual sentiment time series aggregation and prediction
Language: R - Size: 438 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 83 - Forks: 22

NFeruch/reddit2text
The Python toolkit for converting Reddit threads into organized text data. Extract and process Reddit content with ease!
Language: Python - Size: 49.8 KB - Last synced at: 4 months ago - Pushed at: 8 months ago - Stars: 83 - Forks: 7

hiDaDeng/DaDengAndHisPython
【微信公众号:大邓和他的python】, Python语法快速入门https://www.bilibili.com/video/av44384851 Python网络爬虫快速入门https://www.bilibili.com/video/av72010301, 我的联系邮箱[email protected]
Language: Jupyter Notebook - Size: 493 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 83 - Forks: 52

PacktPublishing/Hands-On-Python-Natural-Language-Processing
Language: Jupyter Notebook - Size: 116 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 82 - Forks: 77

docwire/docwire
DocWire SDK: Award-winning modern data processing in C++20. SourceForge Community Choice & Microsoft support. AI-driven processing. Supports nearly 100 data formats, including email boxes and OCR. Boost efficiency in text extraction, web data extraction, data mining, document analysis. Offline processing is possible for security and confidentiality
Language: C++ - Size: 35.8 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 81 - Forks: 18

ziqizhang/jate
NEWS: JATE2.0 Beta.11 Released, see details below.
Language: Java - Size: 286 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 81 - Forks: 29

AllenDang/PipeIt
PipeIt is a text transformation, conversion, cleansing and extraction tool.
Language: Go - Size: 349 KB - Last synced at: 6 days ago - Pushed at: over 3 years ago - Stars: 80 - Forks: 6
