GitHub topics: text-cleaning
adbar/trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Language: Python - Size: 33.8 MB - Last synced at: about 21 hours ago - Pushed at: about 1 month ago - Stars: 4,151 - Forks: 289

HandcartCactus/obsidian-remove-newlines
A plugin for Obsidian.md which removes newlines and blank lines from selected or pasted text.
Language: HTML - Size: 7.59 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 9 - Forks: 0

jfilter/clean-text
🧹 Python package for text cleaning
Language: Python - Size: 157 KB - Last synced at: 7 days ago - Pushed at: almost 2 years ago - Stars: 975 - Forks: 79

wisupai/e2m
E2M converts various file types (doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, m4a) into Markdown. It’s easy to install, with dedicated parsers and converters, supporting custom configs. E2M offers an all-in-one, flexible, and open-source solution.
Language: Jupyter Notebook - Size: 21.7 MB - Last synced at: 8 days ago - Pushed at: 8 months ago - Stars: 1,065 - Forks: 53

blmoistawinde/HarvestText
文本挖掘和预处理工具(文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等),无监督或弱监督方法
Language: Python - Size: 4.27 MB - Last synced at: 10 days ago - Pushed at: 11 months ago - Stars: 2,504 - Forks: 336

hscspring/pnlp
NLP预/后处理工具。
Language: Python - Size: 106 KB - Last synced at: 13 days ago - Pushed at: 22 days ago - Stars: 29 - Forks: 6

reZach/grammarify
Grammarify is a npm package that safely cleans up text that has mispellings, improper capitalization, lexical illusions, among other things.
Language: JavaScript - Size: 412 KB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 73 - Forks: 10

rhnfzl/SqueakyCleanText
Clean your Text for Statistical ML and Language Model
Language: Python - Size: 1.04 MB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 4 - Forks: 0

johnjago/deformat
Remove extra whitespace from text.
Language: HTML - Size: 163 KB - Last synced at: 7 days ago - Pushed at: about 2 months ago - Stars: 6 - Forks: 1

sharejing/Takin
A Python toolkit for file processing, text cleaning and data splitting. 文件处理,文本清洗和数据划分的python工具包。
Language: Python - Size: 2.42 MB - Last synced at: 11 days ago - Pushed at: over 2 years ago - Stars: 32 - Forks: 7

trinker/textclean
Tools for cleaning and normalizing text data
Language: R - Size: 23.8 MB - Last synced at: 17 days ago - Pushed at: over 3 years ago - Stars: 248 - Forks: 26

Youssef155/Sentiment_Analysis
Sentiment Analysis For Restaurant Reviews
Language: Jupyter Notebook - Size: 83 KB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

AndyTheFactory/article-extraction-dataset
Article title, authors, date and body extraction dataset.
Language: HTML - Size: 31.9 MB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 1

mim-solutions/mim_nlp
A Python package with ready-to-use models for various NLP tasks and text preprocessing utilities. The implementation allows fine-tuning.
Language: Jupyter Notebook - Size: 413 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 2 - Forks: 0

dewanakl/aman
Filter sederhana untuk kata kotor menggunakan regex.
Language: PHP - Size: 71.3 KB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 0 - Forks: 2

umapornp/textprepro
👀 Everything Everyway All At Once Text Preprocessing for Natural Language Processing.
Language: Python - Size: 1.3 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

mrqadeer/text_prettifier
Python library designed to clean and preprocess text data by removing unwanted elements such as HTML tags, URLs, numbers, special characters, emojis, contractions, and stopwords. It offers flexible functionality, including options to return text in lowercase and as a list of tokens.
Language: Python - Size: 13.7 KB - Last synced at: 29 days ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

alinapetukhova/textcl
Text preprocessing package for use in NLP tasks https://pypi.org/project/textcl/
Language: Python - Size: 913 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 11 - Forks: 4

Shanmukhi1920/Text_Classification
Developed an NLP system using Gradio and Hugging Face to classify disaster tweets with both machine learning (ML) and deep learning (DL) models.
Language: Jupyter Notebook - Size: 8.23 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Ankur3107/nlp_preprocessing
Text Preprocessing Package includes cleaning, tokenization, dataset preparation ...etc
Language: JavaScript - Size: 5.19 MB - Last synced at: 8 months ago - Pushed at: over 4 years ago - Stars: 17 - Forks: 7

Infinitode/ValX
ValX is an open-source Python package for text cleaning tasks, including profanity detection and removal. Now also includes sensitive information detection, and removal.
Language: Python - Size: 36.1 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

bhattbhavesh91/clean-text-demo
Tutorial on Clean-Text which is a Python package for text cleaning
Language: Jupyter Notebook - Size: 19.5 KB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 2 - Forks: 1

currentslab/extractnet
A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package
Language: HTML - Size: 421 MB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 181 - Forks: 19

Gopalkholade/Language-Detection
Language-Detection
Language: Jupyter Notebook - Size: 1.56 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

NaquibAlam/NLP-with-Disaster-Tweets-Kaggle
Contains the code for this competition, https://www.kaggle.com/c/nlp-getting-started/, hosted on Kaggle
Size: 600 KB - Last synced at: 12 months ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

net-wizard/end2end-nlp
End 2 End NLP project with python
Language: Jupyter Notebook - Size: 3.18 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

dataiku/dss-plugin-nlp-preparation
Dataiku DSS plugin to detect languages, correct misspellings, and clean text data 🧼
Language: Python - Size: 17.9 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 23 - Forks: 6

Aalaa4444/Text_Processing-and-Unique_Word_Extraction_fromHTML
Extract text content from an HTML page, process it, and extract unique words from the processed text. This notebook utilizes various text processing techniques including cleaning, normalization, tokenization, lemmatization or stemming, and stop words removal.
Language: Jupyter Notebook - Size: 12.7 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

enginestein/CleanPhi
A natural language processing framework to clean sentences and texts.
Language: Python - Size: 139 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 1

lprtk/pyTCTK
Python Text Cleaning ToolKit library (pyTCTK)
Language: Python - Size: 21.5 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

doorooful/CapstoneProject
Capstone Design Project(Senior at Seoultech, ITM)
Language: JavaScript - Size: 76.7 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

lprtk/nlp-amazon-customer-reviews
Sentiment analysis, text mining, topic modeling & sentiment prediction
Language: Jupyter Notebook - Size: 6.5 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 2

SaurabhPoman96/Resume_Screening_with_NLP
The recommendation that recommends the right candidates to the recruiters to a job applicantion. The content is the personal information and their job desires. Implementation of a recommender system based using filtering techniques and Natural language processing to recommend top jobs based on similarity.
Language: Jupyter Notebook - Size: 1.29 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

jradha11/sentiment-analysis-nlp
Sentiment Analysis of Restaurant Reviews using NLP
Language: Jupyter Notebook - Size: 59.6 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 3 - Forks: 6

1994nikunj/nlp-toolkit-desktop-app
The code is a collection of NLP analyses, including text cleaning, most common words, n-grams generation, co-occurrence matrix generation, wordcloud generation, topic modeling (using Latent Dirichlet Allocation), and general text statistics.
Language: Python - Size: 251 KB - Last synced at: 5 months ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

Aayushpatel007/topicrankpy
A Python package to get useful information from documents using TopicRank Algorithm.
Language: Python - Size: 72.3 KB - Last synced at: about 17 hours ago - Pushed at: almost 2 years ago - Stars: 16 - Forks: 3

SayamAlt/News-Category-Classification
Successfully developed a news category classification model using fine-tuned BERT which can accurately classify any news text into its respective category i.e. Politics, Business, Technology and Entertainment.
Language: Jupyter Notebook - Size: 3.69 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

odeibarredo/Text-Mining-LOTR-movies-dialogue-
Analysis of the dialogue from the Lord of the Rings movie trilogy.
Language: R - Size: 5.76 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 1

sharmaroshan/Text-Classification
This is a Project Assignment where I have Learned to Classify the Different Texts Using Clustering Techniques. Natural Language Processing and Clustering both of these Concepts are Being Used. I have Used K-means Clustering Techniques to Implement the Problem.
Language: HTML - Size: 88.9 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 1

cwwdaniel/invoice-text-classification
Semantic Enrichment, Data Augmentation and Deep Learning for Boosting Invoice Text Classification Performance: A Novel Natural Language Processing Strategy
Size: 68.4 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

hrushikesh-dhumal/nlp
Boilerplate natural language processing
Language: Jupyter Notebook - Size: 16.6 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 1

ketchley/tdm-workshop
Workshop materials for 'Fundamentals of Text and Data Mining'
Size: 33 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 1

YashSDholam/Tripadvisor-Hotel-Review-Sentiment-Analysis-using-LSTM-Neural-Network
In this project, I utilized the TripAdvisor Hotel Review dataset from Kaggle to perform sentiment analysis on hotel reviews. The main objective was to build a predictive model using LSTM (Long Short-Term Memory) neural networks to classify hotel reviews as positive or negative based on their textual content.
Language: Jupyter Notebook - Size: 6.48 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

I-Am-Timothy-Williams/RNN-in-NLP
Repo with basic start on Recurrent Neural Networks, Word2Vec, Doc2Vec, TFIDF vectors and NLP basics
Language: Python - Size: 364 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ilos-vigil/scl-2020-product-detection
4th place (top 1%) solution for Shopee Code League 2020 - Product Detection
Language: Jupyter Notebook - Size: 13.3 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 7 - Forks: 7

ArbazAnalytics/Cleaning_Text_Data_NLP
Performed text cleaning steps in Natural Language Processing | Uploading One of my college Assignment
Language: Jupyter Notebook - Size: 1.95 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

showmik/TidyText
🖹 Offline Text Cleaner and Formatter
Language: C# - Size: 293 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

MD-Ryhan/NLP-Preprocesing
This repository contains code for preprocessing natural language data for use in NLP applications.
Language: Jupyter Notebook - Size: 10.7 KB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Shawn91/DocTor
A tabular/list/plain text cleaner
Language: JavaScript - Size: 2.22 MB - Last synced at: 6 days ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

ecomp-shONgit/text-normalisation
JS / Python3 / PHP Lib to work with UTF8 polytonic greek and latin
Language: JavaScript - Size: 330 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 10 - Forks: 1

sagepublishing/text_cleaning
Corpora and scripts for cleaning political science texts. Scripts are translated into transformations that support SAGE Texti.
Language: Python - Size: 30.4 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 1

Rumeysakeskin/Preprocessing-Turkish-Text-Data
Preprocessing Turkish text data with cleaning (punctuations, special, accented and unicode characters) and normalizing (numbers, abbreviations)
Language: Jupyter Notebook - Size: 14.6 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

Abhayparashar31/crazytext
A Simple Easy To Use Text Cleaning Package For NLP Built In Python. It Can Clean and Analyze Your Text Data In One Line of Code.
Language: Python - Size: 48.8 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

amansrivastava17/text-preprocess-python
Text preprocessing tools in python.
Language: Python - Size: 39.1 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 24 - Forks: 7

ternaus/ternaus-cleantext
Cleans text as in the CLIP model
Language: Python - Size: 4.88 KB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 1

Bonniface/Text-CLeaning-And-Classification
Text classification is a widely used natural language processing task in different business problems. Given a statement or document, the task involves assigning to it an appropriate category from a pre-defined set of categories. The dataset of choice determines the set of categories. Text classification has applications in emotion classification, n
Language: Jupyter Notebook - Size: 8.34 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

fernandosola/textpp-ptbr
Common Text Pre-Processing for Portuguese
Language: Python - Size: 64.5 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 5 - Forks: 1

YongWookHa/kor-text-preprocess
Korean text data preprocess toolkit for NLP
Language: Python - Size: 39.1 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 16 - Forks: 2

seroetr/preprocess_seroetr
Preprocess Package for https://bit.ly/intro_nlp (Text cleaning and preprocessing example)
Language: Python - Size: 11.7 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Jasani-Parth/Emotion-Detection-Form-Text
Language: Jupyter Notebook - Size: 83 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

Nikoletos-K/Offensive-Comment-Classifier
😈😇🗨️ Multiple ways, to classify comments as insults or neutral
Language: Jupyter Notebook - Size: 22.4 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

ilos-vigil/indonesian-document-clustering
Indonesian News and Article Clustering with K-Means++
Language: Jupyter Notebook - Size: 43.3 MB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 4 - Forks: 0

krisograbek/text-preprocessing
Text preprocessing in Python. Libs include string, re, nltk, spacy, gensim, textblob, unidecode, autocorrect, pyspellchecker
Language: Jupyter Notebook - Size: 81.1 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 1

ilos-vigil/scl-2020-sentiment-analysis
12th place (top 4%) solution for Shopee Code League 2020 - Sentiment Analysis
Language: Jupyter Notebook - Size: 117 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 3

garthmortensen/past_present_future
Past, Present, Future work.
Language: Jupyter Notebook - Size: 45.1 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

chris-bbrs/pdf-merging-and-scraping
PDF merging and scraping for nlp use
Language: Jupyter Notebook - Size: 8.79 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

NijatZeynalov/Cleaning-Text-NLTK
Cleaning Text Manually and with NLTK.
Language: Jupyter Notebook - Size: 36.1 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

mengjie514/Twitter-Sentiment-Analysis-with-Python---Part-II
PhD project - part II
Language: Jupyter Notebook - Size: 20.5 KB - Last synced at: 5 months ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0
