Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: text-cleaning

currentslab/extractnet

A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package

Language: HTML - Size: 421 MB - Last synced: about 5 hours ago - Pushed: 5 months ago - Stars: 174 - Forks: 18

blmoistawinde/HarvestText

文本挖掘和预处理工具(文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等),无监督或弱监督方法

Language: Python - Size: 4.27 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 2,301 - Forks: 328

Youssef155/Sentiment_Analysis

Sentiment Analysis For Restaurant Reviews

Language: Jupyter Notebook - Size: 81.1 KB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 0 - Forks: 0

Gopalkholade/Language-Detection

Language-Detection

Language: Jupyter Notebook - Size: 1.56 MB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 0 - Forks: 0

sharejing/Takin

A Python toolkit for file processing, text cleaning and data splitting. 文件处理,文本清洗和数据划分的python工具包。

Language: Python - Size: 2.42 MB - Last synced: 5 days ago - Pushed: over 1 year ago - Stars: 26 - Forks: 6

mim-solutions/mim_nlp

A Python package with ready-to-use models for various NLP tasks and text preprocessing utilities. The implementation allows fine-tuning.

Language: Jupyter Notebook - Size: 408 KB - Last synced: 12 days ago - Pushed: 12 days ago - Stars: 2 - Forks: 0

bhattbhavesh91/clean-text-demo

Tutorial on Clean-Text which is a Python package for text cleaning

Language: Jupyter Notebook - Size: 19.5 KB - Last synced: 14 days ago - Pushed: almost 3 years ago - Stars: 1 - Forks: 1

NaquibAlam/NLP-with-Disaster-Tweets-Kaggle

Contains the code for this competition, https://www.kaggle.com/c/nlp-getting-started/, hosted on Kaggle

Size: 600 KB - Last synced: 22 days ago - Pushed: over 3 years ago - Stars: 1 - Forks: 0

net-wizard/end2end-nlp

End 2 End NLP project with python

Language: Jupyter Notebook - Size: 3.18 MB - Last synced: 26 days ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

adbar/trafilatura

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

Language: Python - Size: 23.2 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 2,688 - Forks: 205

Infinitode/ValX

ValX is an open-source Python package for text cleaning tasks, including profanity detection and removal. Now also includes sensitive information detection, and removal.

Language: Python - Size: 36.1 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0

dataiku/dss-plugin-nlp-preparation

Dataiku DSS plugin to detect languages, correct misspellings, and clean text data 🧼

Language: Python - Size: 17.9 MB - Last synced: 12 days ago - Pushed: 12 days ago - Stars: 23 - Forks: 8

trinker/textclean

Tools for cleaning and normalizing text data

Language: R - Size: 23.8 MB - Last synced: about 2 hours ago - Pushed: over 2 years ago - Stars: 238 - Forks: 26

reZach/grammarify

Grammarify is a npm package that safely cleans up text that has mispellings, improper capitalization, lexical illusions, among other things.

Language: JavaScript - Size: 412 KB - Last synced: 14 days ago - Pushed: over 1 year ago - Stars: 65 - Forks: 8

jfilter/clean-text

🧹 Python package for text cleaning

Language: Python - Size: 157 KB - Last synced: about 1 month ago - Pushed: about 1 year ago - Stars: 915 - Forks: 77

Aalaa4444/Text_Processing-and-Unique_Word_Extraction_fromHTML

Extract text content from an HTML page, process it, and extract unique words from the processed text. This notebook utilizes various text processing techniques including cleaning, normalization, tokenization, lemmatization or stemming, and stop words removal.

Language: Jupyter Notebook - Size: 12.7 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0

Ankur3107/nlp_preprocessing

Text Preprocessing Package includes cleaning, tokenization, dataset preparation ...etc

Language: JavaScript - Size: 5.19 MB - Last synced: 7 days ago - Pushed: over 3 years ago - Stars: 16 - Forks: 7

johnjago/deformat

Clean up text copied from PDFs.

Language: HTML - Size: 141 KB - Last synced: 30 days ago - Pushed: 4 months ago - Stars: 6 - Forks: 0

AndyTheFactory/article-extraction-dataset

Article title, authors, date and body extraction dataset.

Language: HTML - Size: 31.9 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 1 - Forks: 0

hscspring/pnlp

NLP预/后处理工具。

Language: Python - Size: 230 KB - Last synced: 18 days ago - Pushed: 4 months ago - Stars: 27 - Forks: 7

enginestein/CleanPhi

A natural language processing framework to clean sentences and texts.

Language: Python - Size: 139 KB - Last synced: 2 months ago - Pushed: 7 months ago - Stars: 7 - Forks: 1

lprtk/pyTCTK

Python Text Cleaning ToolKit library (pyTCTK)

Language: Python - Size: 21.5 KB - Last synced: 3 months ago - Pushed: almost 2 years ago - Stars: 1 - Forks: 0

doorooful/CapstoneProject

Capstone Design Project(Senior at Seoultech, ITM)

Language: JavaScript - Size: 76.7 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0

lprtk/nlp-amazon-customer-reviews

Sentiment analysis, text mining, topic modeling & sentiment prediction

Language: Jupyter Notebook - Size: 6.5 MB - Last synced: 3 months ago - Pushed: almost 2 years ago - Stars: 2 - Forks: 2

SaurabhPoman96/Resume_Screening_with_NLP

The recommendation that recommends the right candidates to the recruiters to a job applicantion. The content is the personal information and their job desires. Implementation of a recommender system based using filtering techniques and Natural language processing to recommend top jobs based on similarity.

Language: Jupyter Notebook - Size: 1.29 MB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 0 - Forks: 0

jradha11/sentiment-analysis-nlp

Sentiment Analysis of Restaurant Reviews using NLP

Language: Jupyter Notebook - Size: 59.6 KB - Last synced: 5 months ago - Pushed: almost 4 years ago - Stars: 3 - Forks: 6

1994nikunj/nlp-toolkit-desktop-app

The code is a collection of NLP analyses, including text cleaning, most common words, n-grams generation, co-occurrence matrix generation, wordcloud generation, topic modeling (using Latent Dirichlet Allocation), and general text statistics.

Language: Python - Size: 251 KB - Last synced: 5 days ago - Pushed: about 1 year ago - Stars: 2 - Forks: 0

Aayushpatel007/topicrankpy

A Python package to get useful information from documents using TopicRank Algorithm.

Language: Python - Size: 72.3 KB - Last synced: 15 days ago - Pushed: 10 months ago - Stars: 16 - Forks: 3

alinapetukhova/textcl

Text preprocessing package for use in NLP tasks https://pypi.org/project/textcl/

Language: Python - Size: 891 KB - Last synced: 2 months ago - Pushed: about 2 years ago - Stars: 10 - Forks: 4

odeibarredo/Text-Mining-LOTR-movies-dialogue-

Analysis of the dialogue from the Lord of the Rings movie trilogy.

Language: R - Size: 5.76 MB - Last synced: 7 months ago - Pushed: over 6 years ago - Stars: 0 - Forks: 1

sharmaroshan/Text-Classification

This is a Project Assignment where I have Learned to Classify the Different Texts Using Clustering Techniques. Natural Language Processing and Clustering both of these Concepts are Being Used. I have Used K-means Clustering Techniques to Implement the Problem.

Language: HTML - Size: 88.9 KB - Last synced: 7 months ago - Pushed: over 4 years ago - Stars: 1 - Forks: 1

cwwdaniel/invoice-text-classification

Semantic Enrichment, Data Augmentation and Deep Learning for Boosting Invoice Text Classification Performance: A Novel Natural Language Processing Strategy

Size: 68.4 KB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 2 - Forks: 0

hrushikesh-dhumal/nlp

Boilerplate natural language processing

Language: Jupyter Notebook - Size: 16.6 KB - Last synced: 9 months ago - Pushed: almost 4 years ago - Stars: 1 - Forks: 1

ketchley/tdm-workshop

Workshop materials for 'Fundamentals of Text and Data Mining'

Size: 33 MB - Last synced: 9 months ago - Pushed: about 3 years ago - Stars: 0 - Forks: 1

YashSDholam/Tripadvisor-Hotel-Review-Sentiment-Analysis-using-LSTM-Neural-Network

In this project, I utilized the TripAdvisor Hotel Review dataset from Kaggle to perform sentiment analysis on hotel reviews. The main objective was to build a predictive model using LSTM (Long Short-Term Memory) neural networks to classify hotel reviews as positive or negative based on their textual content.

Language: Jupyter Notebook - Size: 6.48 MB - Last synced: 4 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 0

I-Am-Timothy-Williams/RNN-in-NLP

Repo with basic start on Recurrent Neural Networks, Word2Vec, Doc2Vec, TFIDF vectors and NLP basics

Language: Python - Size: 364 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 0

umapornp/textprepro

👀 Everything Everyway All At Once Text Preprocessing for Natural Language Processing.

Language: Python - Size: 1.3 MB - Last synced: 3 days ago - Pushed: 10 months ago - Stars: 1 - Forks: 0

ilos-vigil/scl-2020-product-detection

4th place (top 1%) solution for Shopee Code League 2020 - Product Detection

Language: Jupyter Notebook - Size: 13.3 MB - Last synced: 10 months ago - Pushed: almost 4 years ago - Stars: 7 - Forks: 7

ArbazAnalytics/Cleaning_Text_Data_NLP

Performed text cleaning steps in Natural Language Processing | Uploading One of my college Assignment

Language: Jupyter Notebook - Size: 1.95 KB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0

showmik/TidyText

🖹 Offline Text Cleaner and Formatter

Language: C# - Size: 293 KB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0

MD-Ryhan/NLP-Preprocesing

This repository contains code for preprocessing natural language data for use in NLP applications.

Language: Jupyter Notebook - Size: 10.7 KB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

Shawn91/DocTor

A tabular/list/plain text cleaner

Language: JavaScript - Size: 2.22 MB - Last synced: 18 days ago - Pushed: over 1 year ago - Stars: 1 - Forks: 0

ecomp-shONgit/text-normalisation

JS / Python3 / PHP Lib to work with UTF8 polytonic greek and latin

Language: JavaScript - Size: 330 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 10 - Forks: 1

sagepublishing/text_cleaning

Corpora and scripts for cleaning political science texts. Scripts are translated into transformations that support SAGE Texti.

Language: Python - Size: 30.4 MB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 5 - Forks: 1

Rumeysakeskin/Preprocessing-Turkish-Text-Data

Preprocessing Turkish text data with cleaning (punctuations, special, accented and unicode characters) and normalizing (numbers, abbreviations)

Language: Jupyter Notebook - Size: 14.6 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 2 - Forks: 0

Abhayparashar31/crazytext

A Simple Easy To Use Text Cleaning Package For NLP Built In Python. It Can Clean and Analyze Your Text Data In One Line of Code.

Language: Python - Size: 48.8 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 1 - Forks: 0

amansrivastava17/text-preprocess-python

Text preprocessing tools in python.

Language: Python - Size: 39.1 KB - Last synced: about 1 year ago - Pushed: about 6 years ago - Stars: 24 - Forks: 7

SayamAlt/News-Category-Classification

Successfully developed a news category classification model using fine-tuned BERT which can accurately classify any news text into its respective category i.e. Politics, Business, Technology and Entertainment.

Language: Jupyter Notebook - Size: 3.69 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

ternaus/ternaus-cleantext

Cleans text as in the CLIP model

Language: Python - Size: 4.88 KB - Last synced: 15 days ago - Pushed: over 1 year ago - Stars: 2 - Forks: 1

Bonniface/Text-CLeaning-And-Classification

Text classification is a widely used natural language processing task in different business problems. Given a statement or document, the task involves assigning to it an appropriate category from a pre-defined set of categories. The dataset of choice determines the set of categories. Text classification has applications in emotion classification, n

Language: Jupyter Notebook - Size: 8.34 MB - Last synced: 12 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

fernandosola/textpp-ptbr

Common Text Pre-Processing for Portuguese

Language: Python - Size: 64.5 KB - Last synced: about 1 year ago - Pushed: almost 5 years ago - Stars: 5 - Forks: 1

YongWookHa/kor-text-preprocess

Korean text data preprocess toolkit for NLP

Language: Python - Size: 39.1 KB - Last synced: about 1 year ago - Pushed: almost 5 years ago - Stars: 16 - Forks: 2

seroetr/preprocess_seroetr

Preprocess Package for https://bit.ly/intro_nlp (Text cleaning and preprocessing example)

Language: Python - Size: 11.7 KB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 1 - Forks: 0

Jasani-Parth/Emotion-Detection-Form-Text

Language: Jupyter Notebook - Size: 83 KB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0

Nikoletos-K/Offensive-Comment-Classifier

😈😇🗨️ Multiple ways, to classify comments as insults or neutral

Language: Jupyter Notebook - Size: 22.4 MB - Last synced: about 1 year ago - Pushed: about 3 years ago - Stars: 1 - Forks: 0

ilos-vigil/indonesian-document-clustering

Indonesian News and Article Clustering with K-Means++

Language: Jupyter Notebook - Size: 43.3 MB - Last synced: 11 months ago - Pushed: almost 4 years ago - Stars: 4 - Forks: 0

krisograbek/text-preprocessing

Text preprocessing in Python. Libs include string, re, nltk, spacy, gensim, textblob, unidecode, autocorrect, pyspellchecker

Language: Jupyter Notebook - Size: 81.1 KB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 3 - Forks: 1

ilos-vigil/scl-2020-sentiment-analysis

12th place (top 4%) solution for Shopee Code League 2020 - Sentiment Analysis

Language: Jupyter Notebook - Size: 117 KB - Last synced: 11 months ago - Pushed: almost 4 years ago - Stars: 1 - Forks: 3

garthmortensen/past_present_future

Past, Present, Future work.

Language: Jupyter Notebook - Size: 45.1 MB - Last synced: about 1 year ago - Pushed: almost 3 years ago - Stars: 1 - Forks: 0

chris-bbrs/pdf-merging-and-scraping

PDF merging and scraping for nlp use

Language: Jupyter Notebook - Size: 8.79 KB - Last synced: 12 months ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0

NijatZeynalov/Cleaning-Text-NLTK

Cleaning Text Manually and with NLTK.

Language: Jupyter Notebook - Size: 36.1 KB - Last synced: about 1 year ago - Pushed: almost 4 years ago - Stars: 1 - Forks: 0

Related Keywords
text-cleaning 61 nlp 30 natural-language-processing 13 python 13 text-preprocessing 12 text-processing 9 text-mining 9 machine-learning 7 text-classification 6 sentiment-analysis 5 text-extraction 5 text-normalization 5 web-scraping 4 deep-learning 4 preprocessing 4 tf-idf 3 lemmatization 3 nlp-machine-learning 3 nltk 3 data-visualization 3 text 3 news 3 text-cleaner 3 scraping 3 nlp-library 3 text-analysis 2 regex 2 data-science 2 datasets 2 text-tokenization 2 tokenization 2 user-generated-content 2 bert 2 feature-engineering 2 article-extractor 2 corpus 2 corpus-builder 2 corpus-tools 2 readability 2 html-to-markdown 2 html2text 2 news-aggregator 2 news-crawler 2 bag-of-words 2 stemming 2 bert-embeddings 2 exploratory-data-analysis 2 named-entity-recognition 2 data-preprocessing 2 jupyter-notebook 2 tfidf 2 topic-modeling 2 sentiment-classification 2 language-detection 2 ocr 1 historical-data 1 textrank 1 topicrank 1 outlier-detection 1 linear-svm 1 tdm 1 textanalysis 1 lstm-neural-networks 1 prediction 1 spacy 1 phone-parse 1 pagerank-python 1 doc2vec-word2vec 1 lstm 1 python3 1 network-x 1 deduplication 1 glove-embeddings 1 bi-lstm 1 pandas 1 multi-class-classification 1 semantic-enrichment 1 short-text 1 numpy 1 text-data-augmentation 1 wordnet-library 1 python-script 1 jupyter-notebooks 1 wordcloud 1 digitalhumanities 1 lotr 1 lordoftherings 1 fundamentals 1 frequent-word 1 korean 1 mecab 1 tokenize 1 tool 1 emotion-detection 1 pandas-dataframe 1 seaborn-plots 1 semantic-analysis 1 gridsearchcv 1 naive-bayes-classifier 1 pos-tagging 1