Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: text-cleaning
currentslab/extractnet
A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package
Language: HTML - Size: 421 MB - Last synced: about 5 hours ago - Pushed: 5 months ago - Stars: 174 - Forks: 18
blmoistawinde/HarvestText
文本挖掘和预处理工具(文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等),无监督或弱监督方法
Language: Python - Size: 4.27 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 2,301 - Forks: 328
Youssef155/Sentiment_Analysis
Sentiment Analysis For Restaurant Reviews
Language: Jupyter Notebook - Size: 81.1 KB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 0 - Forks: 0
Gopalkholade/Language-Detection
Language-Detection
Language: Jupyter Notebook - Size: 1.56 MB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 0 - Forks: 0
sharejing/Takin
A Python toolkit for file processing, text cleaning and data splitting. 文件处理,文本清洗和数据划分的python工具包。
Language: Python - Size: 2.42 MB - Last synced: 5 days ago - Pushed: over 1 year ago - Stars: 26 - Forks: 6
mim-solutions/mim_nlp
A Python package with ready-to-use models for various NLP tasks and text preprocessing utilities. The implementation allows fine-tuning.
Language: Jupyter Notebook - Size: 408 KB - Last synced: 12 days ago - Pushed: 12 days ago - Stars: 2 - Forks: 0
bhattbhavesh91/clean-text-demo
Tutorial on Clean-Text which is a Python package for text cleaning
Language: Jupyter Notebook - Size: 19.5 KB - Last synced: 14 days ago - Pushed: almost 3 years ago - Stars: 1 - Forks: 1
NaquibAlam/NLP-with-Disaster-Tweets-Kaggle
Contains the code for this competition, https://www.kaggle.com/c/nlp-getting-started/, hosted on Kaggle
Size: 600 KB - Last synced: 22 days ago - Pushed: over 3 years ago - Stars: 1 - Forks: 0
net-wizard/end2end-nlp
End 2 End NLP project with python
Language: Jupyter Notebook - Size: 3.18 MB - Last synced: 26 days ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0
adbar/trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Language: Python - Size: 23.2 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 2,688 - Forks: 205
Infinitode/ValX
ValX is an open-source Python package for text cleaning tasks, including profanity detection and removal. Now also includes sensitive information detection, and removal.
Language: Python - Size: 36.1 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0
dataiku/dss-plugin-nlp-preparation
Dataiku DSS plugin to detect languages, correct misspellings, and clean text data 🧼
Language: Python - Size: 17.9 MB - Last synced: 12 days ago - Pushed: 12 days ago - Stars: 23 - Forks: 8
trinker/textclean
Tools for cleaning and normalizing text data
Language: R - Size: 23.8 MB - Last synced: about 2 hours ago - Pushed: over 2 years ago - Stars: 238 - Forks: 26
reZach/grammarify
Grammarify is a npm package that safely cleans up text that has mispellings, improper capitalization, lexical illusions, among other things.
Language: JavaScript - Size: 412 KB - Last synced: 14 days ago - Pushed: over 1 year ago - Stars: 65 - Forks: 8
jfilter/clean-text
🧹 Python package for text cleaning
Language: Python - Size: 157 KB - Last synced: about 1 month ago - Pushed: about 1 year ago - Stars: 915 - Forks: 77
Aalaa4444/Text_Processing-and-Unique_Word_Extraction_fromHTML
Extract text content from an HTML page, process it, and extract unique words from the processed text. This notebook utilizes various text processing techniques including cleaning, normalization, tokenization, lemmatization or stemming, and stop words removal.
Language: Jupyter Notebook - Size: 12.7 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0
Ankur3107/nlp_preprocessing
Text Preprocessing Package includes cleaning, tokenization, dataset preparation ...etc
Language: JavaScript - Size: 5.19 MB - Last synced: 7 days ago - Pushed: over 3 years ago - Stars: 16 - Forks: 7
johnjago/deformat
Clean up text copied from PDFs.
Language: HTML - Size: 141 KB - Last synced: 30 days ago - Pushed: 4 months ago - Stars: 6 - Forks: 0
AndyTheFactory/article-extraction-dataset
Article title, authors, date and body extraction dataset.
Language: HTML - Size: 31.9 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 1 - Forks: 0
hscspring/pnlp
NLP预/后处理工具。
Language: Python - Size: 230 KB - Last synced: 18 days ago - Pushed: 4 months ago - Stars: 27 - Forks: 7
enginestein/CleanPhi
A natural language processing framework to clean sentences and texts.
Language: Python - Size: 139 KB - Last synced: 2 months ago - Pushed: 7 months ago - Stars: 7 - Forks: 1
lprtk/pyTCTK
Python Text Cleaning ToolKit library (pyTCTK)
Language: Python - Size: 21.5 KB - Last synced: 3 months ago - Pushed: almost 2 years ago - Stars: 1 - Forks: 0
doorooful/CapstoneProject
Capstone Design Project(Senior at Seoultech, ITM)
Language: JavaScript - Size: 76.7 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0
lprtk/nlp-amazon-customer-reviews
Sentiment analysis, text mining, topic modeling & sentiment prediction
Language: Jupyter Notebook - Size: 6.5 MB - Last synced: 3 months ago - Pushed: almost 2 years ago - Stars: 2 - Forks: 2
SaurabhPoman96/Resume_Screening_with_NLP
The recommendation that recommends the right candidates to the recruiters to a job applicantion. The content is the personal information and their job desires. Implementation of a recommender system based using filtering techniques and Natural language processing to recommend top jobs based on similarity.
Language: Jupyter Notebook - Size: 1.29 MB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 0 - Forks: 0
jradha11/sentiment-analysis-nlp
Sentiment Analysis of Restaurant Reviews using NLP
Language: Jupyter Notebook - Size: 59.6 KB - Last synced: 5 months ago - Pushed: almost 4 years ago - Stars: 3 - Forks: 6
1994nikunj/nlp-toolkit-desktop-app
The code is a collection of NLP analyses, including text cleaning, most common words, n-grams generation, co-occurrence matrix generation, wordcloud generation, topic modeling (using Latent Dirichlet Allocation), and general text statistics.
Language: Python - Size: 251 KB - Last synced: 5 days ago - Pushed: about 1 year ago - Stars: 2 - Forks: 0
Aayushpatel007/topicrankpy
A Python package to get useful information from documents using TopicRank Algorithm.
Language: Python - Size: 72.3 KB - Last synced: 15 days ago - Pushed: 10 months ago - Stars: 16 - Forks: 3
alinapetukhova/textcl
Text preprocessing package for use in NLP tasks https://pypi.org/project/textcl/
Language: Python - Size: 891 KB - Last synced: 2 months ago - Pushed: about 2 years ago - Stars: 10 - Forks: 4
odeibarredo/Text-Mining-LOTR-movies-dialogue-
Analysis of the dialogue from the Lord of the Rings movie trilogy.
Language: R - Size: 5.76 MB - Last synced: 7 months ago - Pushed: over 6 years ago - Stars: 0 - Forks: 1
sharmaroshan/Text-Classification
This is a Project Assignment where I have Learned to Classify the Different Texts Using Clustering Techniques. Natural Language Processing and Clustering both of these Concepts are Being Used. I have Used K-means Clustering Techniques to Implement the Problem.
Language: HTML - Size: 88.9 KB - Last synced: 7 months ago - Pushed: over 4 years ago - Stars: 1 - Forks: 1
cwwdaniel/invoice-text-classification
Semantic Enrichment, Data Augmentation and Deep Learning for Boosting Invoice Text Classification Performance: A Novel Natural Language Processing Strategy
Size: 68.4 KB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 2 - Forks: 0
hrushikesh-dhumal/nlp
Boilerplate natural language processing
Language: Jupyter Notebook - Size: 16.6 KB - Last synced: 9 months ago - Pushed: almost 4 years ago - Stars: 1 - Forks: 1
ketchley/tdm-workshop
Workshop materials for 'Fundamentals of Text and Data Mining'
Size: 33 MB - Last synced: 9 months ago - Pushed: about 3 years ago - Stars: 0 - Forks: 1
YashSDholam/Tripadvisor-Hotel-Review-Sentiment-Analysis-using-LSTM-Neural-Network
In this project, I utilized the TripAdvisor Hotel Review dataset from Kaggle to perform sentiment analysis on hotel reviews. The main objective was to build a predictive model using LSTM (Long Short-Term Memory) neural networks to classify hotel reviews as positive or negative based on their textual content.
Language: Jupyter Notebook - Size: 6.48 MB - Last synced: 4 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 0
I-Am-Timothy-Williams/RNN-in-NLP
Repo with basic start on Recurrent Neural Networks, Word2Vec, Doc2Vec, TFIDF vectors and NLP basics
Language: Python - Size: 364 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 0
umapornp/textprepro
👀 Everything Everyway All At Once Text Preprocessing for Natural Language Processing.
Language: Python - Size: 1.3 MB - Last synced: 3 days ago - Pushed: 10 months ago - Stars: 1 - Forks: 0
ilos-vigil/scl-2020-product-detection
4th place (top 1%) solution for Shopee Code League 2020 - Product Detection
Language: Jupyter Notebook - Size: 13.3 MB - Last synced: 10 months ago - Pushed: almost 4 years ago - Stars: 7 - Forks: 7
ArbazAnalytics/Cleaning_Text_Data_NLP
Performed text cleaning steps in Natural Language Processing | Uploading One of my college Assignment
Language: Jupyter Notebook - Size: 1.95 KB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0
showmik/TidyText
🖹 Offline Text Cleaner and Formatter
Language: C# - Size: 293 KB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0
MD-Ryhan/NLP-Preprocesing
This repository contains code for preprocessing natural language data for use in NLP applications.
Language: Jupyter Notebook - Size: 10.7 KB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0
Shawn91/DocTor
A tabular/list/plain text cleaner
Language: JavaScript - Size: 2.22 MB - Last synced: 18 days ago - Pushed: over 1 year ago - Stars: 1 - Forks: 0
ecomp-shONgit/text-normalisation
JS / Python3 / PHP Lib to work with UTF8 polytonic greek and latin
Language: JavaScript - Size: 330 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 10 - Forks: 1
sagepublishing/text_cleaning
Corpora and scripts for cleaning political science texts. Scripts are translated into transformations that support SAGE Texti.
Language: Python - Size: 30.4 MB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 5 - Forks: 1
Rumeysakeskin/Preprocessing-Turkish-Text-Data
Preprocessing Turkish text data with cleaning (punctuations, special, accented and unicode characters) and normalizing (numbers, abbreviations)
Language: Jupyter Notebook - Size: 14.6 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 2 - Forks: 0
Abhayparashar31/crazytext
A Simple Easy To Use Text Cleaning Package For NLP Built In Python. It Can Clean and Analyze Your Text Data In One Line of Code.
Language: Python - Size: 48.8 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 1 - Forks: 0
amansrivastava17/text-preprocess-python
Text preprocessing tools in python.
Language: Python - Size: 39.1 KB - Last synced: about 1 year ago - Pushed: about 6 years ago - Stars: 24 - Forks: 7
SayamAlt/News-Category-Classification
Successfully developed a news category classification model using fine-tuned BERT which can accurately classify any news text into its respective category i.e. Politics, Business, Technology and Entertainment.
Language: Jupyter Notebook - Size: 3.69 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
ternaus/ternaus-cleantext
Cleans text as in the CLIP model
Language: Python - Size: 4.88 KB - Last synced: 15 days ago - Pushed: over 1 year ago - Stars: 2 - Forks: 1
Bonniface/Text-CLeaning-And-Classification
Text classification is a widely used natural language processing task in different business problems. Given a statement or document, the task involves assigning to it an appropriate category from a pre-defined set of categories. The dataset of choice determines the set of categories. Text classification has applications in emotion classification, n
Language: Jupyter Notebook - Size: 8.34 MB - Last synced: 12 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
fernandosola/textpp-ptbr
Common Text Pre-Processing for Portuguese
Language: Python - Size: 64.5 KB - Last synced: about 1 year ago - Pushed: almost 5 years ago - Stars: 5 - Forks: 1
YongWookHa/kor-text-preprocess
Korean text data preprocess toolkit for NLP
Language: Python - Size: 39.1 KB - Last synced: about 1 year ago - Pushed: almost 5 years ago - Stars: 16 - Forks: 2
seroetr/preprocess_seroetr
Preprocess Package for https://bit.ly/intro_nlp (Text cleaning and preprocessing example)
Language: Python - Size: 11.7 KB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 1 - Forks: 0
Jasani-Parth/Emotion-Detection-Form-Text
Language: Jupyter Notebook - Size: 83 KB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0
Nikoletos-K/Offensive-Comment-Classifier
😈😇🗨️ Multiple ways, to classify comments as insults or neutral
Language: Jupyter Notebook - Size: 22.4 MB - Last synced: about 1 year ago - Pushed: about 3 years ago - Stars: 1 - Forks: 0
ilos-vigil/indonesian-document-clustering
Indonesian News and Article Clustering with K-Means++
Language: Jupyter Notebook - Size: 43.3 MB - Last synced: 11 months ago - Pushed: almost 4 years ago - Stars: 4 - Forks: 0
krisograbek/text-preprocessing
Text preprocessing in Python. Libs include string, re, nltk, spacy, gensim, textblob, unidecode, autocorrect, pyspellchecker
Language: Jupyter Notebook - Size: 81.1 KB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 3 - Forks: 1
ilos-vigil/scl-2020-sentiment-analysis
12th place (top 4%) solution for Shopee Code League 2020 - Sentiment Analysis
Language: Jupyter Notebook - Size: 117 KB - Last synced: 11 months ago - Pushed: almost 4 years ago - Stars: 1 - Forks: 3
garthmortensen/past_present_future
Past, Present, Future work.
Language: Jupyter Notebook - Size: 45.1 MB - Last synced: about 1 year ago - Pushed: almost 3 years ago - Stars: 1 - Forks: 0
chris-bbrs/pdf-merging-and-scraping
PDF merging and scraping for nlp use
Language: Jupyter Notebook - Size: 8.79 KB - Last synced: 12 months ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0
NijatZeynalov/Cleaning-Text-NLTK
Cleaning Text Manually and with NLTK.
Language: Jupyter Notebook - Size: 36.1 KB - Last synced: about 1 year ago - Pushed: almost 4 years ago - Stars: 1 - Forks: 0