GitHub topics: text-preprocessing
adbar/trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Language: Python - Size: 33.8 MB - Last synced at: about 3 hours ago - Pushed at: about 2 months ago - Stars: 4,855 - Forks: 324
kaabilcoder/Fake-News-Detection
Fake News Detection web app built with Streamlit and Scikit-learn — analyzes news text using NLP and predicts whether it’s real or fake.
Language: Jupyter Notebook - Size: 40.8 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0
jbesomi/texthero
Text preprocessing, representation and visualization from zero to hero.
Language: Python - Size: 22.1 MB - Last synced at: 3 days ago - Pushed at: about 2 years ago - Stars: 2,908 - Forks: 239
Kor4yz/Classification-of-emails
Email classification with classic ML and modern NLP (LSTM/BERT): training, evaluation, benchmarks, reproducible pipeline, CLI and Streamlit demo.
Language: Jupyter Notebook - Size: 3.21 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0
jfilter/clean-text
🧹 Python package for text cleaning
Language: Python - Size: 157 KB - Last synced at: about 23 hours ago - Pushed at: over 2 years ago - Stars: 997 - Forks: 79
maryemchk/arabic-quote-classifier
Arabic-quote classifier: 24 k web-scraped quotes → Arabic NLP pipeline → Decision-Tree model; 74 % acc, 21 cats perfect (AUC 1). Repo + notebooks ready for AraBERT & API upgrades.
Language: Jupyter Notebook - Size: 7.51 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0
tesserato/Inscribe
Markdown preprocessor that runs code fences
Language: Rust - Size: 16.3 MB - Last synced at: 12 days ago - Pushed at: 22 days ago - Stars: 13 - Forks: 0
OkabeRintaro10/MachineLearningProjects
Various Machine Learning projects involving Various Technologies
Language: Jupyter Notebook - Size: 1.2 GB - Last synced at: 16 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0
ezgisubasi/turkish-tweets-sentiment-analysis
This sentiment analysis project determines whether the tweets posted in the Turkish language on Twitter are positive or negative.
Language: Jupyter Notebook - Size: 1.96 MB - Last synced at: 10 days ago - Pushed at: about 2 years ago - Stars: 61 - Forks: 13
CDSoft/ypp
Moved to Codeberg, this repo is just a (temporary) mirror -- Yet a PreProcessor
Language: Lua - Size: 170 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 12 - Forks: 1
SayamAlt/Financial-News-Sentiment-Analysis
Successfully developed a fine-tuned DistilBERT transformer model which can accurately predict the overall sentiment of a piece of financial news up to an accuracy of nearly 81.5%.
Language: Jupyter Notebook - Size: 745 KB - Last synced at: 17 days ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 1
jhlopesalves/CorpusAid
Automated text preprocessing pipeline for large corpora. Features customizable filters for diacritics, stop words, punctuation, and regex.
Language: Python - Size: 1.34 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0
arjuntanil/NLP-CADL-Activities
CADL Activites of NLP (PMC2421A).
Language: Jupyter Notebook - Size: 11.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
danielhaim1/TitleCaser
A powerful utility for transforming text to title case with support for multiple style guides and extensive customization options.
Language: JavaScript - Size: 2.31 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 10 - Forks: 1
mihirparmar0913/Movie_Sentiment_Prediction
in this project i build machine learning model that can predict movie review is postive or negative.
Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0
ssciwr/mailcom
Recognize and pseudonymize named entities in emails
Language: Python - Size: 16.4 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 1
CDSoft/panda
Moved to Codeberg, this repo is just a (temporary) mirror -- Panda is a Pandoc Lua filter that works on internal Pandoc's AST. Panda is heavily inspired by [abp](http:/cdelord.fr/abp) reimplemented as a Pandoc Lua filter.
Language: Lua - Size: 274 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 53 - Forks: 5
lanl/T-ELF
Tensor Extraction of Latent Features (T-ELF). Within T-ELF's arsenal are non-negative matrix and tensor factorization solutions, equipped with automatic model determination (also known as the estimation of latent factors - rank) for accurate data modeling. Our software suite encompasses cutting-edge data pre-processing and post-processing modules.
Language: Python - Size: 49.4 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 20 - Forks: 7
codekush123/Fake_News_Detector
A machine learning–powered tool that classifies news articles as real or fake based on their content. This project uses basic machine learning techniques to clean and vectorize text, combined with supervised learning models to detect misinformation.
Language: Jupyter Notebook - Size: 2.92 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0
vabhishek6/PhantomTrace
High-speed Rust tool to detect & mask PCI/PII in logs, text & streams — secure, configurable, and built for compliance (PCI DSS, GDPR, HIPAA).
Language: Rust - Size: 69.3 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0
VaghasiyaParesh/PhantomTrace
PhantomTrace 🛡️ detects and masks PCI, PII, GDPR and HIPAA-sensitive data in logs, files and streams; offers fast, configurable regex and ML detection, tokenization and audit-ready reports.
Language: Rust - Size: 48.8 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0
HastiGohel/IMDb-NLP-Preprocessing
This project demonstrates basic **Natural Language Processing (NLP)** techniques on the IMDb movie reviews dataset using Python. It covers text cleaning, tokenization, stopword removal, stemming, lemmatization, and creating a complete preprocessing pipeline.
Language: Jupyter Notebook - Size: 16.6 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0
SeoBuAs/2025_SW_Univ_Text_Challenge
2025 SW중심대학 디지털 경진대회 : AI부문 (상상부기팀)
Language: Python - Size: 6.1 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0
AdelAdool/News-Category-Classifier
News Category Classification using AG News dataset. Implements text preprocessing, TF-IDF vectorization, and trains Logistic Regression and a Neural Network to classify news into World, Sports, Business, and Sci/Tech categories. Includes data visualization and model evaluation.
Language: Python - Size: 1.49 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0
SayamAlt/Resume-Classification-using-fine-tuned-BERT
Successfully developed a resume classification model which can accurately classify the resume of any person into its corresponding job with a tremendously high accuracy of more than 99%.
Language: Jupyter Notebook - Size: 1.19 MB - Last synced at: 2 months ago - Pushed at: almost 3 years ago - Stars: 8 - Forks: 4
Losif01/text-preprocessing-to-transformers-NLP-notes
This repo is my personal notes from the Stanford NLP course, and i currently use it personally as a reference
Size: 58.6 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0
JRick27/nenglish-stopwords
Curated Nenglish stopwords for chat analysis help improve NLP tasks like sentiment analysis and keyword extraction. Perfect for preprocessing informal digital conversations. 🛠️📊
Size: 9.77 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0
AqilGardezi/Smart-Notes-Summary
This project designed to generate concise and meaningful summaries from lengthy claim notes in healthcare. Using advanced natural language processing (NLP) techniques, it extracts key information, simplifies complex text, and helps streamline claim review processes.
Size: 8.79 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0
pantpujan017/nenglish-stopwords
nepali stop words
Size: 11.7 KB - Last synced at: 17 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0
SayamAlt/Emotion-Detection-using-fine-tuned-BERT-Transformer
Successfully developed a fine-tuned BERT transformer model which can effectively perform emotion classification on any given piece of texts to identify a suitable human emotion based on semantic meaning of the text.
Language: Jupyter Notebook - Size: 971 KB - Last synced at: 2 months ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 1
jangedoo/jange
Easy NLP in Python
Language: Python - Size: 2.06 MB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 18 - Forks: 4
varelaerick/Sentiment-Analyse-Deep-Learn-Amazon-App
5 - Personal Project - Sentiment Analyse+Deep Learn - Amazon App
Language: Jupyter Notebook - Size: 1.96 MB - Last synced at: 4 months ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0
devspidr/NLP-Tools
A collection of powerful Natural Language Processing (NLP) tools and scripts for tasks like text preprocessing, sentiment analysis, keyword extraction, and more — built with Python and popular NLP libraries.
Language: Python - Size: 35.2 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0
Willgnner-Santos/DPE-Legal-Doc-Classification-Pipeline
The results are drawn from experiments on the classification of legal documents using LLMs in a real-world institutional setting
Language: Jupyter Notebook - Size: 45.8 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0
theveryhim/Massive-text-processing
cleaning, processing and analysis of papers' dataset in pyspark(rdd) framework
Language: Jupyter Notebook - Size: 1.31 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0
SuneshSundarasami/Multilabel-Toxicity-Detection-Using-Classical-RNN-and-Transformer-Architectures
End-to-end ML workflow for multi-label toxic comment detection using NLP. Implements advanced text preprocessing, multi-label vectorization, and models (Logistic Regression, RNNs, Transformers). Includes scripts for data cleaning, training, and per-label metrics.
Language: Jupyter Notebook - Size: 17.7 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0
Avinraj01/SHL-Grammar-Scoring-Engine-for-Voice-Samples
This model predicts grammar scores (1–5) from audio files. It uses Whisper to transcribe speech to text, cleans the text, and extracts features with TF-IDF. A Random Forest Regressor is trained to learn grammar score patterns. Evaluation via Pearson Correlation showed good results.
Language: Jupyter Notebook - Size: 80.1 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0
kunalPisolkar24/NLP_Lab
Collection of practical codes for Savitribai Phule Pune University's Natural Language Processing Laboratory (410256).
Language: Jupyter Notebook - Size: 795 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0
AtheerAlzhrani/arabic_nlp
This repository contains projects focused on Arabic Natural Language Processing (NLP)
Language: Jupyter Notebook - Size: 433 KB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0
bilalhameed248/FAQ-Chat-Bot-Using-VertexAI
A generative AI-based FAQ Chat-Bot with a Flask Back-End, designed to operate within an organization's internal domain. - Jul 2023 - Oct 2023
Language: Jupyter Notebook - Size: 10.7 KB - Last synced at: 4 months ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0
SD7Campeon/Yelp-Sentiment-Analysis-with-Python-BS4-and-LLM
A scalable pipeline for automated extraction, preprocessing, and sentiment analysis of Yelp reviews. Uses advanced HTTP requests, HTML parsing, and text normalization (tokenization, stopword removal, lemmatization) to enable precise polarity and subjectivity analysis for consumer insights and business analytics.
Size: 11.7 KB - Last synced at: 16 days ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0
FarrelAD/Simple-TikTok-Post-Text-Mining
A simple case study to learn how to do text mining from TikTok post
Language: Python - Size: 1.29 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0
Ad-Chekk/EchoAI
Web Content Analyzer with LLMs is a powerful tool for scraping, processing, and analyzing web content using advanced Machine Learning (ML) and Natural Language Processing (NLP) techniques. It leverages state-of-the-art models such as RoBERTa for extractive question answering, BART for summarization, and various other NLP models for tasks like senti
Language: Python - Size: 47.9 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0
MohamedMoubarakHussein/Auto-completing-text-using-n-grams-model
text autocompletion system using N-gram language models with prefix-based filtering and Add-k smoothing. Processes JSON datasets to provide probability-ranked word suggestions for interactive text completion.
Language: Jupyter Notebook - Size: 904 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0
mrqadeer/text_prettifier
Python library designed to clean and preprocess text data by removing unwanted elements such as HTML tags, URLs, numbers, special characters, emojis, contractions, and stopwords. It offers flexible functionality, including options to return text in lowercase and as a list of tokens.
Language: Python - Size: 26.4 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0
Shanmukhi1920/Text-Classification
Developed an NLP system using Gradio and Hugging Face to classify disaster tweets with both machine learning (ML) and deep learning (DL) models.
Language: Jupyter Notebook - Size: 8.23 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0
farhad-here/TextPrepX
A Multilingual Text Preprocessing Tool for English and Persian.
Language: Python - Size: 3.74 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0
yhaslan/EverySecondMatters-LLM-app
This repository contains my full-stack data science project for the JEDHA bootcamp and validation of Bloc 6 of the RNCP certificate.
Language: Jupyter Notebook - Size: 4.91 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0
MoustafaMohamed01/web-summarizer-ai
A Python tool to scrape and summarize website content using AI. Built with Selenium, BeautifulSoup, LLaMA 3.2, and Google's Gemini AI, this project extracts the main text from any website and generates a concise summary in markdown format. Perfect for quickly understanding long articles, blogs, or news pages.
Language: Jupyter Notebook - Size: 42 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0
SayamAlt/Symptoms-Disease-Text-Classification
Successfully developed a fine-tuned BERT transformer model which can accurately classify symptoms to their corresponding diseases upto an accuracy of 89%.
Language: Jupyter Notebook - Size: 860 KB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0
Raj-UtsaV/IMDB_Movies_Review
"A sentiment analysis project using IMDb movie reviews with NLP and machine learning techniques to classify reviews as positive or negative."
Language: Jupyter Notebook - Size: 108 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0
Lipairui/textgo
Text preprocessing, representation, similarity calculation, text search and classification. Let's go and play with text!
Language: Python - Size: 532 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 45 - Forks: 3
Aashi2608/Natural-language-Processing
A Natural Language Processing (NLP) project that applies machine learning to detect fraud in vehicle insurance claims by analyzing textual data. Combines preprocessing, feature extraction, and classification models for intelligent claims analysis.
Size: 1.04 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0
SayamAlt/Language-Detection-using-fine-tuned-XLM-Roberta-Base-Transformer-Model
Successfully developed a language detection transformer model that can accurately recognize the language in which any given text is written.
Language: Jupyter Notebook - Size: 1.09 MB - Last synced at: 2 months ago - Pushed at: almost 3 years ago - Stars: 7 - Forks: 4
Ibraddah/SHL-Grammar-Scoring-Engine-for-Voice-Samples
This model predicts grammar scores (1–5) from audio files. It uses Whisper to transcribe speech to text, cleans the text, and extracts features with TF-IDF. A Random Forest Regressor is trained to learn grammar score patterns. Evaluation via Pearson Correlation showed good results.
Language: Jupyter Notebook - Size: 34.2 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0
xga0/DisasterTweetPrediction
Kaggle Competition: Real or Not? NLP with Disaster Tweets.
Language: Python - Size: 37.1 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0
SayamAlt/Abstractive-Text-Summarization-of-News-Articles
Successfully developed an encoder-decoder based sequence to sequence (Seq2Seq) model which can summarize the entire text of an Indian news summary into a short paragraph with limited number of words.
Language: Jupyter Notebook - Size: 4.83 MB - Last synced at: 6 months ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 1
AniK4111/Netflix_Movies_And_TV_Shows_Clustering
Unsupervised Machine Learning project for Netflix Movies and TV Shows Clustering. The main goal of this project is to create a content-based recommender system that recommends top 10 shows to users based on their viewing history.
Size: 2.58 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0
vedavyas0105/Financial-Sentiment-Distillation
This project leverages knowledge distillation to create a lightweight yet powerful sentiment analysis model, tailored specifically for financial news data. Using a teacher-student approach, the project distills knowledge from a large FinBERT model into a compact DistilBERT-based student model, balancing performance and efficiency.
Language: Jupyter Notebook - Size: 919 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0
SayamAlt/Cyberbullying-Classification-using-fine-tuned-DistilBERT
Successfully fine-tuned a pretrained DistilBERT transformer model that can classify social media text data into one of 4 cyberbullying labels i.e. ethnicity/race, gender/sexual, religion and not cyberbullying with a remarkable accuracy of 99%.
Language: Jupyter Notebook - Size: 7.24 MB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0
venkat-0706/Sentimental-Analysis
Build a model to classify text as positive, negative, or neutral. Apply NLP techniques for preprocessing and machine learning for classification. Aim for accurate sentiment prediction on various text formats.
Language: Jupyter Notebook - Size: 280 KB - Last synced at: 7 months ago - Pushed at: about 1 year ago - Stars: 13 - Forks: 2
berknology/text-preprocessing
A python package for text preprocessing task in natural language processing.
Language: Python - Size: 40 KB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 63 - Forks: 7
ArNAB-0053/Song-Identifier
It identifies songs and artists from lyric snippets using two distinct methods - simple NLP based approach and BM25(Best Match 25) approach.
Language: Jupyter Notebook - Size: 19.7 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0
evanch98/natural-language-processing-python
Natural Language Processing
Language: Jupyter Notebook - Size: 4.88 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0
pngo1997/N-gram-Language-Models
Builds N-gram language modes and applies text generation.
Language: Jupyter Notebook - Size: 4.73 MB - Last synced at: 8 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0
swathisivaprabu/ML-Projects
This repository documents my journey in Machine Learning. Explored data preprocessing, feature engineering, and model training. Built models for classification, regression, and NLP tasks. Continuously learning and improving.
Language: Jupyter Notebook - Size: 166 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0
SayamAlt/Fake-News-Classification-using-fine-tuned-BERT
Successfully developed a text classification model to predict whether a given news text is fake or not by fine-tuning a pretrained BERT transformed model imported from Hugging Face.
Language: Jupyter Notebook - Size: 18 MB - Last synced at: 7 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0
AndyTheFactory/article-extraction-dataset
Article title, authors, date and body extraction dataset.
Language: HTML - Size: 31.9 MB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 1
pngo1997/Text-Processing-Tokenization
Simple text analysis and tokenization.
Language: Jupyter Notebook - Size: 185 KB - Last synced at: 8 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0
pngo1997/Word-Embeddings-Co-occurrence-SVD-GloVe
Explores word embeddings.
Language: Jupyter Notebook - Size: 149 KB - Last synced at: 8 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0
AjayKumar095/Natural_Language_Processing
Explore cutting-edge Natural Language Processing (NLP) techniques in this GitHub repository. Includes pre-trained models, custom NLP pipelines, text preprocessing tools, sentiment analysis, text classification, and more. Ideal for research, learning, and deploying NLP solutions in Python.
Language: Jupyter Notebook - Size: 2.32 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0
gkalocsai/metatrans
Transpiler engine
Language: Java - Size: 958 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0
SayamAlt/Mental-Health-Classification-using-fine-tuned-DistilBERT
Successfully established a multiclass text classification model by fine-tuning pretrained DistilBERT transformer model to classify several distinct types of mental health statuses such as anxiety, stress, personality disorder, etc. with an accuracy of 77%.
Language: Jupyter Notebook - Size: 2.07 MB - Last synced at: 27 days ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0
DeepakMishra99/Natural_Language_Processing_Practice
Natural Language Processing
Language: Jupyter Notebook - Size: 354 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0
p208p2002/wikitext-table-parser
A WikiText table parser written in Rust.
Language: Rust - Size: 104 KB - Last synced at: 23 days ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0
SayamAlt/Luxury-Apparel-Product-Category-Classification-using-fine-tuned-DistilBERT
Successfully developed a multiclass text classification model by fine-tuning pretrained DistilBERT transformer model to classify various distinct types of luxury apparels into their respective categories i.e. pants, accessories, underwear, shoes, etc.
Language: Jupyter Notebook - Size: 3.7 MB - Last synced at: 26 days ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0
mim-solutions/mim_nlp
A Python package with ready-to-use models for various NLP tasks and text preprocessing utilities. The implementation allows fine-tuning.
Language: Jupyter Notebook - Size: 413 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0
erickmaiaa/nlp
Exploration of NLP concepts, including text preprocessing, language models, and practical applications like sentiment analysis, using tools like NLTK, spaCy, and transformers.
Language: Jupyter Notebook - Size: 156 KB - Last synced at: 9 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0
gaaniruddha/FIT5196-A1
This repository contains assignments #1 that was completed as a part of "FIT5196 Data Wrangling", taught at Monash Uni in S2 2020.
Language: Jupyter Notebook - Size: 17.3 MB - Last synced at: 8 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0
Ailln/proces
🐨 text preprocess.
Language: Python - Size: 42 KB - Last synced at: 8 days ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 0
adilrasheed139/AI-Powered-Resume-Screening-using-BERT
Successfully developed a resume classification model which can accurately classify the resume of any person into its corresponding job with a tremendously high accuracy of more than 99%.
Language: Jupyter Notebook - Size: 1.19 MB - Last synced at: 5 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0
giocoal/reddit-tldr-summarizer-and-topic-modeling
Extreme Extractive Text Summarization and Topic Modeling (using LSA and LDA techniques) over Reddit Posts from TLDRHQ dataset.
Language: Python - Size: 52.5 MB - Last synced at: 8 months ago - Pushed at: almost 2 years ago - Stars: 7 - Forks: 1
omar-sherif9992/Dialect-LLM-Bachelor-Project
The aim of the Bachelor project is to innovate a new way for Arabic (Egyptian-Dialect) Sentiment Analysis , Forecasting and Topic Modeling using Machine Learning , Deep Learning and Transformers!
Language: Jupyter Notebook - Size: 16.6 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 7 - Forks: 0
lyeoni/prenlp
Preprocessing Library for Natural Language Processing
Language: Python - Size: 156 KB - Last synced at: 7 months ago - Pushed at: almost 3 years ago - Stars: 161 - Forks: 12
MusfiqDehan/data-preprocessors
🛠️An easy to use tool for Data Preprocessing specially for Text Preprocessing
Language: Python - Size: 193 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 1
Jesly-Joji/Spam-Ham-Classifier
Used Naive Bayes Algorithm, NLP Text Preprocessing Techniques
Language: Jupyter Notebook - Size: 961 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0
umapornp/textprepro
👀 Everything Everyway All At Once Text Preprocessing for Natural Language Processing.
Language: Python - Size: 1.3 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0
vishnun0027/Sentiment-Analysis
Here the several ways to perform sentiment analysis on text data, with varying degrees of complexity and accuracy
Language: Jupyter Notebook - Size: 40 MB - Last synced at: 8 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0
kunalPisolkar24/IR_Lab
Collection of practical codes for Savitribai Phule Pune University's Information Retrieval Lab (410247) .
Language: Jupyter Notebook - Size: 125 KB - Last synced at: 8 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0
Shubhamd1234/SMS_Spam_Detection_Model_Using_NLP
An NLP-based model designed to effectively identify and filter spam SMS messages. NLTK Library, Text Preprocessing, IF-IDF and more techniques used.
Language: Jupyter Notebook - Size: 1.17 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0
catherinetweeks/text-preprocessing-articles
Preprocesses text from news articles.
Language: Python - Size: 5.86 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0
SeyedShahab-A/Topic-Modeling
A project on applying Dirichlet Allocation (LDA) to uncover key topics influencing customer satisfaction and dissatisfaction
Language: R - Size: 2.93 KB - Last synced at: 11 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0
franklinen/Potential-Talents
NLP-based pipeline for talent discovery
Language: Jupyter Notebook - Size: 18.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0
iamluirio/echo-chambers-news-aggregators
We propose different measures that can quantitatively and qualitatively study characterization of echo chambers in news media aggregators across different users.
Language: Jupyter Notebook - Size: 6.96 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0
sebastianherman/bachelors
Bachelor's project
Language: Jupyter Notebook - Size: 17.3 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0
mrqadeer/internet_words_remover
Python module designed to replace common internet slang and abbreviations with their full forms, enhancing the readability of informal text. It efficiently cleans text data from chats, social media, and online communication. The module also supports tokenization and integrates seamlessly with pandas for batch processing of text in DataFrames.
Language: Python - Size: 31.3 KB - Last synced at: 8 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0
kayl26/TextRetrieval_SearchEngines
Assignments completed for CP423: Text Retrieval and Search Engines. Collaborated with Abigail Lee and Myisha Chaudhry
Language: Jupyter Notebook - Size: 9.93 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0
mevlutayilmaz/text-summarization
text summarization in python
Language: Python - Size: 16.6 KB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0
parthu34/Text-Processing-Application-Using-NLP-
Airline User Review Tweets – A Sentiment Analysis
Language: Python - Size: 4.06 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0
dbadeev/tweets
Цель проекта - анализ тональности твитов. Для сообщений пользователей из тестового набора, необходимо предсказать с максимально возможным результатом, является ли тональность твита положительной, отрицательной или нейтральной.
Language: Jupyter Notebook - Size: 3.64 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0