Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: text-mining
ishijo/Taylor-Swift-Lyrics
Database (.txt and .csv) of all Taylor Swift Song Lyrics upto April'23
Language: Jupyter Notebook - Size: 15.8 MB - Last synced: about 3 hours ago - Pushed: about 4 hours ago - Stars: 7 - Forks: 3
annajiat/2022-09-24-bracu-nlp
Language: HTML - Size: 2.08 MB - Last synced: about 10 hours ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
aphp/edsnlp
Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes.
Language: Python - Size: 87 MB - Last synced: about 9 hours ago - Pushed: about 18 hours ago - Stars: 97 - Forks: 27
adbar/trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Language: Python - Size: 23.1 MB - Last synced: about 15 hours ago - Pushed: 3 days ago - Stars: 2,965 - Forks: 228
fitria-dwi/Hoax-Detection
This project aims to build a model to predict the truth of an article, hoax or non-hoax. Apart from that, this project also wants to identify the percentage of hoax and non-hoax articles.
Language: Jupyter Notebook - Size: 4.22 MB - Last synced: 1 day ago - Pushed: 3 days ago - Stars: 2 - Forks: 0
contefranz/OpTop
Optimal topic identification from a pool of Latent Dirichlet Allocation models
Language: R - Size: 327 KB - Last synced: 1 day ago - Pushed: over 2 years ago - Stars: 10 - Forks: 0
raniavirdas/Text-Mining-First-Project
It is my group's middle project on text classification during a student exchange at Asia University, Taiwan. It uses five types of names of articles in PubMed.
Language: Jupyter Notebook - Size: 28.1 MB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 0 - Forks: 0
mesolitica/malaysian-dataset
We gather Malaysian dataset! https://malaysian-dataset.readthedocs.io/
Language: Jupyter Notebook - Size: 1.33 GB - Last synced: 2 days ago - Pushed: 3 days ago - Stars: 284 - Forks: 102
jbesomi/texthero
Text preprocessing, representation and visualization from zero to hero.
Language: Python - Size: 22.1 MB - Last synced: 2 days ago - Pushed: 9 months ago - Stars: 2,869 - Forks: 238
FatimaUriarte/Python
Python files employed in my research
Language: Jupyter Notebook - Size: 3.66 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 0 - Forks: 0
Aayshashukla/SentimentAnalysis
Twitter Sentiment Analysis using Natural Language Processing(NLP)
Language: Jupyter Notebook - Size: 9.39 MB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 0 - Forks: 0
currentslab/extractnet
A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package
Language: HTML - Size: 421 MB - Last synced: about 15 hours ago - Pushed: 5 months ago - Stars: 181 - Forks: 19
DmitryRyumin/EMNLP-2023-Papers
EMNLP 2023 Papers: Explore cutting-edge research from EMNLP 2023, the premier conference for advancing empirical methods in natural language processing. Stay updated on the latest in machine learning, deep learning, and natural language processing with code included. :star: support NLP!
Language: Python - Size: 6.43 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 78 - Forks: 3
kernel-loophole/KG-graph
Knowledge graph from unstructured text
Language: Python - Size: 2.37 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 4 - Forks: 0
chiphuyen/lazynlp
Library to scrape and clean web pages to create massive datasets.
Language: Python - Size: 37.1 KB - Last synced: 2 days ago - Pushed: over 3 years ago - Stars: 2,150 - Forks: 310
saahilk1511/Web-Analytics-and-Mining
My codes for CS 688 Web Analytics and mining
Language: Jupyter Notebook - Size: 15.3 MB - Last synced: 3 days ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0
KomeijiForce/AutoPersona
自动从脏网页文本中提取角色人设信息的中文llama-3模型
Language: Python - Size: 13.7 KB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 0 - Forks: 0
nilswende/nlp-toolbox
Reimplementation of the Hagen NLPToolbox
Language: Java - Size: 27.3 MB - Last synced: 4 days ago - Pushed: about 2 years ago - Stars: 2 - Forks: 0
HanXinzi-AI/awesome-python-machine-learning-resources
a collection of awesome machine learning and deep learning Python libraries&tools. 热门实用机器学习和深入学习Python库和工具的集合
Size: 10.5 MB - Last synced: 4 days ago - Pushed: 5 days ago - Stars: 104 - Forks: 22
TeaZea/Gmail-Scraper_Word-Analysis
A small script that scrapes your gmail and creates a word analysis visualization from the contents of the queried emails.
Language: Jupyter Notebook - Size: 512 KB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 1 - Forks: 0
yogeshhk/MiningResume
Text Mining certain fields from a resume
Language: Jupyter Notebook - Size: 1.5 MB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 53 - Forks: 43
adbar/German-NLP
Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German
Size: 103 KB - Last synced: about 15 hours ago - Pushed: 5 days ago - Stars: 406 - Forks: 58
gesiscss/awesome-computational-social-science
A list of awesome resources for Computational Social Science
Language: R - Size: 170 KB - Last synced: 4 days ago - Pushed: 21 days ago - Stars: 465 - Forks: 59
JesusSalinas/master_upb
Text Analysis
Language: Python - Size: 705 KB - Last synced: 4 days ago - Pushed: 5 days ago - Stars: 0 - Forks: 0
trinker/textreadr
Tools to uniformly read in text data including semi-structured transcripts
Language: R - Size: 1.78 MB - Last synced: about 14 hours ago - Pushed: about 1 year ago - Stars: 73 - Forks: 5
lasigeBioTM/BENT
Biomedical Term Annotator
Language: Python - Size: 6.06 MB - Last synced: 4 days ago - Pushed: 5 days ago - Stars: 9 - Forks: 1
erikhoward/hgwellsr
An R data package of selected H. G. Wells novels to be used for NLP research.
Language: R - Size: 602 KB - Last synced: 5 days ago - Pushed: about 6 years ago - Stars: 4 - Forks: 0
erikhoward/grimmr
An R package for Fairy Tales by The Brothers Grimm
Language: R - Size: 4.88 KB - Last synced: 5 days ago - Pushed: about 6 years ago - Stars: 1 - Forks: 0
KMiNT21/html2sent
HTML2SENT modifies HTML to improve sentences tokenizer quality
Language: Python - Size: 44.9 KB - Last synced: 6 days ago - Pushed: almost 5 years ago - Stars: 8 - Forks: 2
klaudia-dikunow/tweets-classification
Language: Jupyter Notebook - Size: 960 KB - Last synced: 6 days ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0
Chaymae-ipynb/Text-Mining-Projects
Language: Jupyter Notebook - Size: 30.3 KB - Last synced: 5 days ago - Pushed: 6 days ago - Stars: 0 - Forks: 0
rvhonorato/cazy-parser
A way to extract specific information from CAZy
Language: Python - Size: 120 KB - Last synced: 7 days ago - Pushed: 7 months ago - Stars: 12 - Forks: 8
gengoai/gengoai
Mono Repository for GengoAI projects
Language: Java - Size: 14.7 MB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 2 - Forks: 0
antoniooliveira03/Projects
Projects I have worked during my Bachelor
Language: Jupyter Notebook - Size: 18.1 MB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 0 - Forks: 0
lisc-tools/lisc
Literature Scanner: Automated collection & analyses of the scientific literature.
Language: Python - Size: 6.21 MB - Last synced: 6 days ago - Pushed: about 1 month ago - Stars: 88 - Forks: 11
ncbi-nlp/PubMed-Best-Match
Machine-learning based pipeline relying on LambdaMART currently used in PubMed for relevance (Best Match) searches
Language: Python - Size: 2.63 MB - Last synced: 8 days ago - Pushed: about 6 years ago - Stars: 38 - Forks: 11
garygsw/twitter-crowd-flow-prediction
Crowd flow prediction model
Language: Python - Size: 644 MB - Last synced: 8 days ago - Pushed: about 6 years ago - Stars: 9 - Forks: 1
dimeji-kazeem/text-analytics Fork of oladimeji-kazeem/text-analytics
The ultimate solution for text summarization. Whether you're a student looking to condense lengthy research papers, a professional needing to digest complex reports, or just someone who wants to get to the essence of an article quickly, SummarizeMaster has got you covered.
Language: Python - Size: 3.19 MB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 0 - Forks: 0
caufieldjh/awesome-bioie
🧫 A curated list of resources relevant to doing Biomedical Information Extraction (including BioNLP)
Size: 595 KB - Last synced: 4 days ago - Pushed: about 1 year ago - Stars: 308 - Forks: 32
ArdentEmpiricist/text_analysis
Analyze text stored as *.txt in chosen file or directory. Doesn't read files in subdirectories. Counting all words and then searching for every unique word in the vicinity (+-5 words).
Language: Rust - Size: 173 KB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 2 - Forks: 0
laugustyniak/awesome-sentiment-analysis
Repository with all what is necessary for sentiment analysis and related areas
Size: 36.1 KB - Last synced: 2 days ago - Pushed: 6 months ago - Stars: 525 - Forks: 108
ranja-sarkar/document
PDF files can be read using various python library packages viz., tabula, pdfplumber etc. Here I've defined a class to parse files from a directory and save/store their information using pdfplumber in an output file.
Language: Jupyter Notebook - Size: 218 KB - Last synced: 9 days ago - Pushed: 9 days ago - Stars: 0 - Forks: 0
lfoppiano/document-qa
Scientific Document Insight Q/A
Language: Python - Size: 597 KB - Last synced: 10 days ago - Pushed: 10 days ago - Stars: 16 - Forks: 3
brainhack-school2020/koudyk_bhs_project
A Python package that creates a visualization the use of methods in citation networks over time.
Language: Jupyter Notebook - Size: 43.6 MB - Last synced: 10 days ago - Pushed: almost 4 years ago - Stars: 2 - Forks: 0
aminkhod/Search-engine
Practice to implement a simple news search engine
Language: Jupyter Notebook - Size: 258 KB - Last synced: 10 days ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0
aminkhod/PersainTextClusteringWithHazm
I use Request Psychological Advice texts in Persian. I clean data and prepare it with the Hazm project. Then cluster them by using Genetic_Kmeans Algorithm and compare results with normal Kmeans and Birch Algorithms.
Language: Jupyter Notebook - Size: 19 MB - Last synced: 10 days ago - Pushed: about 4 years ago - Stars: 3 - Forks: 0
disi-unibo-nlp/POIROT
POIROT: Phenomena Explanation from Text. Unsupervised learning of interpretable and statistically significant knowledge.
Language: Jupyter Notebook - Size: 52.6 MB - Last synced: 10 days ago - Pushed: over 1 year ago - Stars: 1 - Forks: 1
mcs07/ChemDataExtractor
Automatically extract chemical information from scientific documents
Language: Python - Size: 542 KB - Last synced: 7 days ago - Pushed: 10 months ago - Stars: 283 - Forks: 112
raminrahimzada/az-corpus-nlp
Dataset Materials , NLP for Azerbaijan language
Size: 807 MB - Last synced: 9 days ago - Pushed: 9 months ago - Stars: 9 - Forks: 4
alisafaya/txt-from-pdf
Extracting clean text from pdfs using pdfminer.six and pypdf.
Language: Python - Size: 23.4 KB - Last synced: 10 days ago - Pushed: 11 days ago - Stars: 1 - Forks: 0
PetrKorab/Arabica
Python package for exploratory text data analysis
Language: Python - Size: 102 MB - Last synced: 10 days ago - Pushed: 10 days ago - Stars: 58 - Forks: 14
pusztaipatrik/job-postings
Results of a Data analytics project at TH Wildau. Created with Orange data analytics tool, Data source: https://www.kaggle.com/datasets/PromptCloudHQ/us-jobs-on-monstercom
Size: 11.5 MB - Last synced: 11 days ago - Pushed: 11 days ago - Stars: 0 - Forks: 0
jakelever/biotext
Get a nicely-chunked local copy of the biomedical literature (to use for other projects)!
Language: Python - Size: 210 KB - Last synced: 11 days ago - Pushed: 11 days ago - Stars: 13 - Forks: 5
agusnieto77/ACEP
Análisis Computacional de Eventos de Protesta (ACEP). Computer-Aided Protest Event Analysis (CAPEA)
Language: R - Size: 106 MB - Last synced: 9 days ago - Pushed: 2 months ago - Stars: 8 - Forks: 2
MIT-LCP/bloatectomy
A python package for removing duplicate text in clinical notes or other documents
Language: TeX - Size: 7.48 MB - Last synced: 9 days ago - Pushed: almost 4 years ago - Stars: 32 - Forks: 9
stepthom/text_mining_resources
Resources for learning about Text Mining and Natural Language Processing
Size: 707 KB - Last synced: 3 days ago - Pushed: over 1 year ago - Stars: 553 - Forks: 200
bookieio/breadability
Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)
Language: HTML - Size: 604 KB - Last synced: 12 days ago - Pushed: 12 days ago - Stars: 203 - Forks: 26
kavgan/nlp-in-practice
Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
Language: Jupyter Notebook - Size: 91.8 MB - Last synced: 7 days ago - Pushed: over 3 years ago - Stars: 1,120 - Forks: 781
oroszgy/awesome-hungarian-nlp
A curated list of NLP resources for Hungarian
Size: 110 KB - Last synced: 2 days ago - Pushed: 7 months ago - Stars: 208 - Forks: 18
Kirscher/TextMining_Parcours_de_soin
Offical repo of the paper "A novel methodological framework for the analysis of health trajectories and survival outcomes in heart failure patients" (ICLR 2024)
Language: HTML - Size: 39.3 MB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 1 - Forks: 1
kmk4842/opus2021
Translating financial lexicons to other languages via WordNet
Language: Python - Size: 851 KB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 0 - Forks: 0
LMU-Seminar-LLMs/TopicGPT
TopicGPT allows to integrate the benefits of LLMs into Topic Modelling
Language: Python - Size: 14 MB - Last synced: 11 days ago - Pushed: 8 months ago - Stars: 15 - Forks: 1
nika-akin/EC-Web-Scrapping-and-Text-Mining
Documentation for crawling, parsing contents of web page and analysing opinions on AI and overview of methods
Language: HTML - Size: 293 MB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 0 - Forks: 0
juliasilge/tidytext
Text mining using tidy tools :sparkles::page_facing_up::sparkles:
Language: R - Size: 129 MB - Last synced: 6 days ago - Pushed: about 1 month ago - Stars: 1,159 - Forks: 181
sp1thas/ceid-thesis 📦
Thesis report and implementation for feature extraction based on geographical origin of the author
Language: Python - Size: 1.09 MB - Last synced: 14 days ago - Pushed: about 3 years ago - Stars: 2 - Forks: 0
sharmilathirumalai/TF-IDF
IR implemented by using TF-IDF method
Language: Java - Size: 10.7 MB - Last synced: 14 days ago - Pushed: over 4 years ago - Stars: 0 - Forks: 0
moamenibrahim/nlp-teaching
Testing files for Natural language processing course projects @university_of_oulu
Language: Python - Size: 5.84 MB - Last synced: 14 days ago - Pushed: 15 days ago - Stars: 0 - Forks: 1
qminer/qminer
Analytic platform for real-time large-scale streams containing structured and unstructured data.
Language: C++ - Size: 39.6 MB - Last synced: 12 days ago - Pushed: about 1 year ago - Stars: 219 - Forks: 57
navigating-stories/orange-story-navigator
Add-on to the Orange3 data mining toolkit with text processing widgets from the project Navigating Stories
Language: Python - Size: 8.24 MB - Last synced: 13 days ago - Pushed: 14 days ago - Stars: 0 - Forks: 2
mathsyouth/awesome-text-summarization Fork of lipiji/App-DL
A curated list of resources dedicated to text summarization
Size: 243 KB - Last synced: 3 days ago - Pushed: over 1 year ago - Stars: 1,531 - Forks: 267
mkk-1817/Youtube-Comments-Sentiment-Analysis
Size: 1.95 KB - Last synced: 14 days ago - Pushed: 14 days ago - Stars: 0 - Forks: 0
vmenger/deduce
Deduce: de-identification method for Dutch medical text
Language: Python - Size: 7.21 MB - Last synced: 4 days ago - Pushed: 22 days ago - Stars: 48 - Forks: 19
pkourdis/gateplugin-SUTime
GATE plugin to annotate documents with TIMEX3 tags using the SUTime library.
Language: Java - Size: 138 KB - Last synced: 15 days ago - Pushed: over 6 years ago - Stars: 2 - Forks: 0
graphbrain/graphbrain
Language, Knowledge, Cognition
Language: Python - Size: 103 MB - Last synced: 15 days ago - Pushed: 15 days ago - Stars: 564 - Forks: 62
sergioburdisso/pyss3
A Python package implementing a new interpretable machine learning model for text classification (with visualization tools for Explainable AI :octocat:)
Language: Python - Size: 102 MB - Last synced: 9 days ago - Pushed: 9 months ago - Stars: 332 - Forks: 44
Lambda-3/DiscourseSimplification
Extension of the SentenceSimplification project
Language: Java - Size: 1.47 MB - Last synced: 14 days ago - Pushed: 15 days ago - Stars: 53 - Forks: 13
bilalhassankhan007/ML-Movie-recommendation-System
Movie recommedation system deploy on Heroku
Language: Jupyter Notebook - Size: 1.8 MB - Last synced: 16 days ago - Pushed: 16 days ago - Stars: 0 - Forks: 0
atse0612/Data-Science-Capstone
This is the repository for the final course in the Data Science Specialization on Coursera.
Language: HTML - Size: 607 KB - Last synced: 16 days ago - Pushed: over 6 years ago - Stars: 0 - Forks: 1
DivyaSharma0795/AppleVisionPro_Dataset
Sentiment analysis of Apple Vision Pro tweets using multiple models
Language: Jupyter Notebook - Size: 15.2 MB - Last synced: 16 days ago - Pushed: 17 days ago - Stars: 1 - Forks: 0
abhishek-kathuria/Reddit-Graph-Network
Identify the toxic and harmful subreddit groups for US elections dataset using Graph Data Structure and Data Mining
Language: Jupyter Notebook - Size: 3.22 MB - Last synced: 18 days ago - Pushed: about 2 years ago - Stars: 1 - Forks: 0
johnmatzakos/automatic-labeling-of-text-data
Algorithms For Automatic Labelling Of Text Data. A Text Mining project that studies Supervised and Semi-Supervised Learning on Twitter data.
Language: R - Size: 703 KB - Last synced: 19 days ago - Pushed: over 3 years ago - Stars: 1 - Forks: 0
johnmatzakos/detect-fake-news-machine-learning
Detecting fake news using a range of classic machine learning algorithms
Language: Jupyter Notebook - Size: 810 KB - Last synced: 19 days ago - Pushed: about 2 years ago - Stars: 0 - Forks: 0
greenelab/snorkeling
Extracting biomedical relationships from literature with Snorkel 🏊
Language: Jupyter Notebook - Size: 326 MB - Last synced: 19 days ago - Pushed: over 3 years ago - Stars: 58 - Forks: 17
hetio/medline
Computing term cooccurrence in MEDLINE
Language: Jupyter Notebook - Size: 139 MB - Last synced: 19 days ago - Pushed: about 3 years ago - Stars: 16 - Forks: 4
danielvartan/iramuteqlike
💬⛏️ Tools to reproduce the IRaMuTeQ software analyses in R
Language: R - Size: 2.78 MB - Last synced: 20 days ago - Pushed: 8 months ago - Stars: 7 - Forks: 1
s3nh/markov_chain
Simple markov chain for tweet generator
Language: Python - Size: 63.5 KB - Last synced: 20 days ago - Pushed: over 5 years ago - Stars: 1 - Forks: 0
storopoli/topic-modeling-workshop
Slides for Topic Modeling Workshop
Language: R - Size: 3.47 MB - Last synced: 20 days ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0
bayuik/nlp_tensorflow
sentyment analysis with NLP and tensorflow. Multiclass text classification
Language: Jupyter Notebook - Size: 3.91 MB - Last synced: 20 days ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
mgg39/AI-that-recognizes-fake-news
LSTM network designed to recognize Fake news from "True" news
Language: Python - Size: 8.79 KB - Last synced: 20 days ago - Pushed: over 3 years ago - Stars: 1 - Forks: 0
fuchsia-programming/scrape 📦
When you need those jobs hypersonic 🚀 scrape 🔪
Language: JavaScript - Size: 2.79 MB - Last synced: 20 days ago - Pushed: over 4 years ago - Stars: 10 - Forks: 3
omarsar/text_mining_lab_2017
Requirements for Text Mining Summer Course (Lab Session)
Language: Jupyter Notebook - Size: 14.3 MB - Last synced: 20 days ago - Pushed: almost 7 years ago - Stars: 5 - Forks: 4
omarsar/omarsar.github.io
My Blog - Research and Life Experience. :speech_balloon:
Language: HTML - Size: 35.1 MB - Last synced: 20 days ago - Pushed: 7 months ago - Stars: 3 - Forks: 0
omarsar/friendly_nlp
Mini blog for notes and guides on Natural Language Processing (Open Notes)
Language: HTML - Size: 2.13 MB - Last synced: 20 days ago - Pushed: over 6 years ago - Stars: 3 - Forks: 1
omarsar/friendly_data_science
Material and resources for the "Friendly Data Science" YouTube series.
Language: Jupyter Notebook - Size: 2.84 MB - Last synced: 20 days ago - Pushed: over 6 years ago - Stars: 4 - Forks: 1
omarsar/dm_2018_hw_1
Holds instructions for assignment 1 of the Data Mining course
Language: Jupyter Notebook - Size: 10.7 KB - Last synced: 20 days ago - Pushed: over 5 years ago - Stars: 2 - Forks: 41
lfoppiano/SuperMat
Superconductors material dataset
Language: Jupyter Notebook - Size: 20.2 MB - Last synced: 15 days ago - Pushed: 6 months ago - Stars: 23 - Forks: 2
apoorvalal/manual-ngrams-newspapers
code to construct google ngram style diagrams using text scraped from pdfs of a newspaper archive
Language: Jupyter Notebook - Size: 119 MB - Last synced: 20 days ago - Pushed: over 3 years ago - Stars: 1 - Forks: 3
news-r/webhoserx 📦
📝Feature extraction extension for webhoser
Language: R - Size: 30.3 KB - Last synced: 20 days ago - Pushed: almost 5 years ago - Stars: 2 - Forks: 1
news-r/phrasenets
Create Phrase Networks
Language: R - Size: 2.97 MB - Last synced: 20 days ago - Pushed: over 4 years ago - Stars: 3 - Forks: 0
JohnCoene/sacred
📖 Sacred texts in R
Language: R - Size: 39.1 MB - Last synced: 20 days ago - Pushed: about 5 years ago - Stars: 21 - Forks: 6