GitHub topics: text-mining

Repositories

kk7nc/HDLTex

HDLTex: Hierarchical Deep Learning for Text Classification

Language: Python - Size: 32 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 272 - Forks: 65

nluninja/text-mining-dataviz

Data Visualization and Text Mining course repository: it provides notebook implementation for data analysis and machine learning applied to text content - UNICATT:

Language: Jupyter Notebook - Size: 127 MB - Last synced at: 21 days ago - Pushed at: 5 months ago - Stars: 6 - Forks: 0

lfoppiano/document-qa

Scientific Document Insight Q/A

Language: Python - Size: 635 KB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 29 - Forks: 5

kongusen/Graphuison

A RAG-based framework for constructing scientific knowledge graphs.

Language: Python - Size: 172 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 4 - Forks: 0

lining0806/TextMining

Python文本挖掘系统 Research of Text Mining System

Language: Python - Size: 3.79 MB - Last synced at: about 1 month ago - Pushed at: about 7 years ago - Stars: 341 - Forks: 154

klajosw/python

Python data analyst, integration, migration, quality

Language: Jupyter Notebook - Size: 42.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 2

huspacy/huspacy

HuSpaCy: industrial-strength Hungarian natural language processing

Language: Python - Size: 2.2 MB - Last synced at: 29 days ago - Pushed at: 7 months ago - Stars: 165 - Forks: 15

dhamodharanrk/dhamodharanrk.github.io

Welcome to my career portfolio

Language: HTML - Size: 1.61 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

navigating-stories/orange-story-navigator

Add-on to the Orange3 data mining toolkit with text processing widgets from the project Navigating Stories

Language: Python - Size: 14.7 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 4 - Forks: 1

carlosacchi/captiocrweb

This is the web interface for CaptiOCR, a real-time live captions screen text extraction tool. CaptiOCR allows you to capture, extract, and transform on-screen text instantly.

Language: HTML - Size: 729 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

notesjor/corpusexplorer2.0

Korpuslinguistik war noch nie so einfach...

Language: C# - Size: 32.5 MB - Last synced at: about 2 hours ago - Pushed at: 2 months ago - Stars: 23 - Forks: 3

Análisis de texto literario del poema "El Cuervo" de Edgar Allan Poe Proyecto de minería de texto que extrae, limpia y visualiza el contenido del famoso poema "El Cuervo" utilizando Python, spaCy, visualizaciones y procesamiento lingüístico en español.

Language: Python - Size: 171 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Parakh-4/r-course-exercise

🏋️♂️ Exercise for the Course "An Introduction to the R Programming Language"

Language: R - Size: 4.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

gsurma/password_cracker

Char-level RNN LSTM password cracker 🔑🔓.

Size: 1.02 MB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 56 - Forks: 16

CaritoRamos/text-mining-project-in-python

This project applies Text Mining techniques using Python (NLTK, spaCy, TextBlob) to analyze a book. It includes text cleaning, tokenization, sentiment analysis, and keyword extraction to uncover insights.

Language: Jupyter Notebook - Size: 2.9 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Clement-LVD/codexplor

R package : assess & monitor programming projects with standardized metrics

Language: R - Size: 6.06 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

PasaOpasen/ContentDetector

Detect hard/soft skills from resumes in Russian

Language: Python - Size: 72.8 MB - Last synced at: 27 days ago - Pushed at: almost 2 years ago - Stars: 6 - Forks: 4

psychbruce/PsychWordVec

🔜 Integrative Toolbox of Word Embedding Research for Psychological Science.

Language: R - Size: 44.5 MB - Last synced at: 7 days ago - Pushed at: about 1 month ago - Stars: 22 - Forks: 1

san089/Big_Data_Project

Fake News Detection - Feature Extraction using Vectorization such as Count Vectorizer, TFIDF Vectorizer, Hash Vectorizer,. Then used an Ensemble model to classify whether the news is fake or not.

Language: Python - Size: 12.4 MB - Last synced at: about 1 month ago - Pushed at: about 5 years ago - Stars: 19 - Forks: 12

catlism/catlism.github.io

Companion website for "Corpus Approaches to Language in Social Media" - source and build versions

Language: HTML - Size: 45.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

juliasilge/learntidytext

Learn about text mining 📄 with tidy data principles

Language: CSS - Size: 6.22 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 46 - Forks: 9

caimeng2/seesus

A Python package that identifies 17 Sustainable Development Goals and their 169 Targets in text, and classifies into social, environmental, and economic sustainability.

Language: Python - Size: 827 KB - Last synced at: 9 days ago - Pushed at: 10 months ago - Stars: 8 - Forks: 2

nalimilan/R.TeMiS

R.TeMiS: R Text Mining Solution

Language: C - Size: 9.87 MB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 28 - Forks: 6

SCAI-BIO/cv-extraction

Web-Tool for LLM based CV extraction

Language: Python - Size: 70.3 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 1 - Forks: 0

Living-with-machines/T-Res

A Toponym Resolution Pipeline for Digitised Historical Newspapers

Language: Python - Size: 11.3 MB - Last synced at: 29 days ago - Pushed at: about 1 month ago - Stars: 8 - Forks: 1

DmitryRyumin/EMNLP-2023-Papers

EMNLP 2023 Papers: Explore cutting-edge research from EMNLP 2023, the premier conference for advancing empirical methods in natural language processing. Stay updated on the latest in machine learning, deep learning, and natural language processing with code included. :star: support NLP!

Language: Python - Size: 6.43 MB - Last synced at: 29 days ago - Pushed at: 12 months ago - Stars: 107 - Forks: 7

Makepad-fr/fbjs

Tooling that automates your Facebook interactions.

Language: TypeScript - Size: 588 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 62 - Forks: 25

graphbrain/graphbrain

Language, Knowledge, Cognition

Language: Python - Size: 103 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 598 - Forks: 69

lisc-tools/lisc

Literature Scanner: Automated collection & analyses of the scientific literature.

Language: Python - Size: 6.85 MB - Last synced at: 9 days ago - Pushed at: 14 days ago - Stars: 106 - Forks: 12

kk7nc/RMDL

RMDL: Random Multimodel Deep Learning for Classification

Language: Python - Size: 223 MB - Last synced at: 29 days ago - Pushed at: almost 2 years ago - Stars: 430 - Forks: 122

luozhouyang/AutoPhraseX

Automated Phrase Mining from Massive Text Corpora in Python.

Language: Python - Size: 90.8 KB - Last synced at: about 22 hours ago - Pushed at: almost 4 years ago - Stars: 171 - Forks: 37

tax-8974/location-analyzer

The Location Data Analyzer is a Spring Boot application that offers insights on location data, such as counting locations by type, calculating average ratings, and identifying the most reviewed and incomplete entries. It features a simple frontend (HTML, CSS, JavaScript) and is deployed on Render.

Language: Java - Size: 0 Bytes - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

pbellot/ANF-TDM

Code, données et documentations de l'atelier "Apprentissage automatique pour la classification textuelle" organisé dans le cadre de l'Action Nationale de Formation "Exploration documentaire et extraction d'information" CNRS-INRAE en 2020-21.

Language: Jupyter Notebook - Size: 57.3 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 1

LMU-Seminar-LLMs/TopicGPT

TopicGPT allows to integrate the benefits of LLMs into Topic Modelling

Language: Python - Size: 14 MB - Last synced at: 4 days ago - Pushed at: 11 months ago - Stars: 25 - Forks: 3

arj1211/cluster-links

pipeline that extracts, cleans, embeds, and clusters web links into topical groups using text extraction, semantic keyword extraction, and unsupervised clustering

Language: Python - Size: 34.2 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

CBravoR/AdvancedAnalyticsLabs

Analytics labs notebooks for Statistics and Business School students

Language: Jupyter Notebook - Size: 12.2 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 30 - Forks: 17

jphall663/GWU_data_mining

Materials for GWU DNSC 6279 and DNSC 6290.

Language: Jupyter Notebook - Size: 186 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 238 - Forks: 173

sfu-discourse-lab/GenderGapTracker

Scrape news articles and analyze them using NLP to quantify the gender gap in Canadian mainstream media

Language: Python - Size: 9.34 MB - Last synced at: 7 days ago - Pushed at: about 1 year ago - Stars: 42 - Forks: 11

GGNoWayBack/cathodedataextractor

A document-level information extraction pipeline for layered cathode materials for sodium-ion batteries.

Language: Python - Size: 608 KB - Last synced at: 1 day ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 2

stdlib-js/nlp-tokenize

Tokenize a string.

Language: JavaScript - Size: 834 KB - Last synced at: 29 days ago - Pushed at: 4 months ago - Stars: 3 - Forks: 0

BlueObelisk/oscar4

OSCAR (Open Source Chemistry Analysis Routines) is an open source extensible system for the automated annotation of chemistry in scientific articles.

Language: Java - Size: 125 MB - Last synced at: 28 days ago - Pushed at: 2 months ago - Stars: 31 - Forks: 4

fendouai/Awesome-Text-Classification

Awesome-Text-Classification Projects,Papers,Tutorial .

Size: 7.81 KB - Last synced at: about 16 hours ago - Pushed at: over 7 years ago - Stars: 171 - Forks: 32

trinker/lexicon

A data package containing lexicons and dictionaries for text analysis

Language: R - Size: 9.17 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 110 - Forks: 14

trinker/readability

Fast readability scores for text data

Language: R - Size: 175 KB - Last synced at: about 1 month ago - Pushed at: over 6 years ago - Stars: 22 - Forks: 4

trinker/textreadr

Tools to uniformly read in text data including semi-structured transcripts

Language: R - Size: 1.78 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 74 - Forks: 5

hrbrmstr/misinfo

📊 Tools to Perform ‘Misinformation’ Analysis on a Text Corpus (wrapper for methods in https://github.com/PDXBek/Misinformation)

Language: R - Size: 401 KB - Last synced at: about 1 month ago - Pushed at: about 7 years ago - Stars: 16 - Forks: 0

hrbrmstr/elpresidente

🇺🇸 Search and Extract Corpus Elements from 'The American Presidency Project'

Language: R - Size: 17.6 KB - Last synced at: about 1 month ago - Pushed at: about 7 years ago - Stars: 20 - Forks: 1

mkearney/textfeatures

👷‍♂️ A simple package for extracting useful features from character objects 👷‍♀️

Language: R - Size: 7.64 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 167 - Forks: 17

assafmo/xioc

Extract indicators of compromise from text, including "escaped" ones.

Language: Go - Size: 64.5 KB - Last synced at: about 1 month ago - Pushed at: about 5 years ago - Stars: 159 - Forks: 13

andrewtavis/kwx

BERT, LDA, and TFIDF based keyword extraction in Python

Language: Python - Size: 12.3 MB - Last synced at: 28 days ago - Pushed at: about 1 year ago - Stars: 72 - Forks: 10

narrnar/133FP

UCLA STATS 133 Final Project

Size: 5.44 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

gsurma/text_predictor

Char-level RNN LSTM text generator📄.

Language: Python - Size: 125 MB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 115 - Forks: 35

bigartm/bigartm

Fast topic modeling platform

Language: C++ - Size: 16.8 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 668 - Forks: 120

RajnishProgrammer/NLTK-Textual-Analysis

NLP pipeline for text processing and feature extraction 🛠

Language: Python - Size: 350 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

mathsyouth/awesome-text-summarization Fork of lipiji/App-DL

A curated list of resources dedicated to text summarization

Size: 243 KB - Last synced at: 18 days ago - Pushed at: over 2 years ago - Stars: 1,542 - Forks: 265

brandonleekramer/tidyorgs

A tidy package that detects and standardizes organizations in unstructured text data

Language: R - Size: 48.2 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 7 - Forks: 0

stewart-lab/fast_km

A Containerized KinderMiner / Serial KinderMiner Server

Language: Python - Size: 16.3 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 2

WLXie-Tony/Movie_Review_Analysis

A comprehensive pipeline for scraping, structuring, and analyzing IMDb movie reviews. This repository includes automated web scraping scripts, structured datasets, and advanced large language model (LLM)-based sentiment analysis to extract insights from user reviews.

Language: Python - Size: 120 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

luizsci42/Analise-de-sentimentos-pandemia-covid19

Repositório utilizado para o plano de PIBIC 2020-2021 com o prof. Dr. Hendrik Macedo. Tem como finalidade criar um dataset para treinamento de modelos de aprendizado de máquina sobre as 5 emoções de Ekman e analisar os sentimentos predominantes durante os primeiros 12 meses da pandemia de COVID-19.

Language: Jupyter Notebook - Size: 9.66 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0