text-preprocessing | Topic | Ecosyste.ms: Repos

Topic: "text-preprocessing"

Aashi2608/Natural-language-Processing

A Natural Language Processing (NLP) project that applies machine learning to detect fraud in vehicle insurance claims by analyzing textual data. Combines preprocessing, feature extraction, and classification models for intelligent claims analysis.

Size: 1.04 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

Ibraddah/SHL-Grammar-Scoring-Engine-for-Voice-Samples

This model predicts grammar scores (1–5) from audio files. It uses Whisper to transcribe speech to text, cleans the text, and extracts features with TF-IDF. A Random Forest Regressor is trained to learn grammar score patterns. Evaluation via Pearson Correlation showed good results.

Language: Jupyter Notebook - Size: 34.2 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

xga0/DisasterTweetPrediction

Kaggle Competition: Real or Not? NLP with Disaster Tweets.

Language: Python - Size: 37.1 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

Willgnner-Santos/DPE-Legal-Doc-Classification-Pipeline

The results are drawn from experiments on the classification of legal documents using LLMs in a real-world institutional setting

Language: Jupyter Notebook - Size: 42 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 0 - Forks: 0

AniK4111/Netflix_Movies_And_TV_Shows_Clustering

Unsupervised Machine Learning project for Netflix Movies and TV Shows Clustering. The main goal of this project is to create a content-based recommender system that recommends top 10 shows to users based on their viewing history.

Size: 2.58 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

MoustafaMohamed01/web-summarizer-ai

A Python tool to scrape and summarize website content using AI. Built with Selenium, BeautifulSoup, and Google's Gemini AI, this project extracts the main text from any website and generates a concise summary in markdown format. Perfect for quickly understanding long articles, blogs, or news pages.

Language: Jupyter Notebook - Size: 12.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

vedavyas0105/Financial-Sentiment-Distillation

This project leverages knowledge distillation to create a lightweight yet powerful sentiment analysis model, tailored specifically for financial news data. Using a teacher-student approach, the project distills knowledge from a large FinBERT model into a compact DistilBERT-based student model, balancing performance and efficiency.

Language: Jupyter Notebook - Size: 919 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

ArNAB-0053/Song-Identifier

It identifies songs and artists from lyric snippets using two distinct methods - simple NLP based approach and BM25(Best Match 25) approach.

Language: Jupyter Notebook - Size: 19.7 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

evanch98/natural-language-processing-python

Natural Language Processing

Language: Jupyter Notebook - Size: 4.88 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

swathisivaprabu/ML-Projects

This repository documents my journey in Machine Learning. Explored data preprocessing, feature engineering, and model training. Built models for classification, regression, and NLP tasks. Continuously learning and improving.

Language: Jupyter Notebook - Size: 166 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

pngo1997/Text-Processing-Tokenization

Simple text analysis and tokenization.

Language: Jupyter Notebook - Size: 185 KB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

pngo1997/Word-Embeddings-Co-occurrence-SVD-GloVe

Explores word embeddings.

Language: Jupyter Notebook - Size: 149 KB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

vinayakdasgupta/anvay

anvay is is a Flask-based Bengali text processing and topic modeling tool that uses Latent Dirichlet Allocation (LDA) to extract topics from uploaded text files.

Language: HTML - Size: 3.86 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

gkalocsai/metatrans

Transpiler engine

Language: Java - Size: 958 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

SayamAlt/Mental-Health-Classification-using-fine-tuned-DistilBERT

Successfully established a multiclass text classification model by fine-tuning pretrained DistilBERT transformer model to classify several distinct types of mental health statuses such as anxiety, stress, personality disorder, etc. with an accuracy of 77%.

Language: Jupyter Notebook - Size: 2.07 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

SayamAlt/Luxury-Apparel-Product-Category-Classification-using-fine-tuned-DistilBERT

Successfully developed a multiclass text classification model by fine-tuning pretrained DistilBERT transformer model to classify various distinct types of luxury apparels into their respective categories i.e. pants, accessories, underwear, shoes, etc.

Language: Jupyter Notebook - Size: 3.7 MB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

gaaniruddha/FIT5196-A1

This repository contains assignments #1 that was completed as a part of "FIT5196 Data Wrangling", taught at Monash Uni in S2 2020.

Language: Jupyter Notebook - Size: 17.3 MB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

erickmaiaa/nlp

Exploration of NLP concepts, including text preprocessing, language models, and practical applications like sentiment analysis, using tools like NLTK, spaCy, and transformers.

Language: Jupyter Notebook - Size: 156 KB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

kunalPisolkar24/IR_Lab

Collection of practical codes for Savitribai Phule Pune University's Information Retrieval Lab (410247) .

Language: Jupyter Notebook - Size: 125 KB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Shubhamd1234/SMS_Spam_Detection_Model_Using_NLP

An NLP-based model designed to effectively identify and filter spam SMS messages. NLTK Library, Text Preprocessing, IF-IDF and more techniques used.

Language: Jupyter Notebook - Size: 1.17 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

catherinetweeks/text-preprocessing-articles

Preprocesses text from news articles.

Language: Python - Size: 5.86 KB - Last synced at: 6 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

SeyedShahab-A/Topic-Modeling

A project on applying Dirichlet Allocation (LDA) to uncover key topics influencing customer satisfaction and dissatisfaction

Language: R - Size: 2.93 KB - Last synced at: 5 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

AtheerAlzhrani/arabic_nlp

This repository contains projects focused on Arabic Natural Language Processing (NLP)

Language: Jupyter Notebook - Size: 433 KB - Last synced at: 28 days ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

franklinen/Potential-Talents

NLP-based pipeline for talent discovery

Language: Jupyter Notebook - Size: 18.6 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

iamluirio/echo-chambers-news-aggregators

We propose different measures that can quantitatively and qualitatively study characterization of echo chambers in news media aggregators across different users.

Language: Jupyter Notebook - Size: 6.96 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

mrqadeer/text_prettifier

Python library designed to clean and preprocess text data by removing unwanted elements such as HTML tags, URLs, numbers, special characters, emojis, contractions, and stopwords. It offers flexible functionality, including options to return text in lowercase and as a list of tokens.

Language: Python - Size: 13.7 KB - Last synced at: 29 days ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

mevlutayilmaz/text-summarization

text summarization in python

Language: Python - Size: 16.6 KB - Last synced at: 9 days ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

dbadeev/tweets

Цель проекта - анализ тональности твитов. Для сообщений пользователей из тестового набора, необходимо предсказать с максимально возможным результатом, является ли тональность твита положительной, отрицательной или нейтральной.

Language: Jupyter Notebook - Size: 3.64 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

SiddiquiZainab/Song-Lyrics-Generation-Model

A machine learning model for generating song lyrics using advanced neural network techniques. This model leverages Bi-directional LSTM to create coherent and creative lyrics in various styles.

Language: Jupyter Notebook - Size: 94.7 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Shanmukhi1920/Text_Classification

Developed an NLP system using Gradio and Hugging Face to classify disaster tweets with both machine learning (ML) and deep learning (DL) models.

Language: Jupyter Notebook - Size: 8.23 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Mohana-Murugan/NLP

NLP

Language: Jupyter Notebook - Size: 4.44 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

fwaskito/ta

Tugas akhir (final year project). This source code was used in the bachelor thesis “Public Sentiment Analysis of Mental Disorder based on Twitter Texts using Support Vector Machine”.

Language: Jupyter Notebook - Size: 4.92 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

minseok0809/text-line-converter

텍스트 대소문자 변환, 한 줄 변환, 특수 문자 제거 프로그램

Language: TeX - Size: 89.9 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

vishnun0027/Sentiment-Analysis

Here the several ways to perform sentiment analysis on text data, with varying degrees of complexity and accuracy

Language: Jupyter Notebook - Size: 40 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

mParthSaharanf/Comprehensive-Text-Extraction-and-Analysis-for-Article-Metrics

This project extracts and analyzes textual data from given URLs using BeautifulSoup and NLTK. It performs sentiment analysis, word complexity assessment, and calculates average word length, saving results in text and CSV formats.

Language: Python - Size: 38.3 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

michailidisa/sentiment-analysis-on-hotel-reviews

Classification of hotel reviews on positive and negative class by using sentiment analysis

Size: 1.82 MB - Last synced at: 7 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

kayl26/TextRetrieval_SearchEngines

Assignments completed for CP423: Text Retrieval and Search Engines. Collaborated with Abigail Lee and Myisha Chaudhry

Language: Jupyter Notebook - Size: 9.93 MB - Last synced at: 8 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

abinashsahoo007/Project-Resume-Classification

The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention.

Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: 16 days ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

mrqadeer/internet_words_remover

Python module designed to replace common internet slang and abbreviations with their full forms, enhancing the readability of informal text. It efficiently cleans text data from chats, social media, and online communication. The module also supports tokenization and integrates seamlessly with pandas for batch processing of text in DataFrames.

Language: Python - Size: 31.3 KB - Last synced at: 29 days ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

vlada-pv/Prediction-Sociolinguistic-Data-Based-on-the-Diaries-Texts-of-the-Prozhito-Project

The repository contains notebooks created for collecting and preprocessing the corpus of diary entries and for experiments on creating models for predicting gender, age groups of authors and the time period of text creation.

Language: Jupyter Notebook - Size: 2.06 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

AmruhaAhmed/OIBSIP

Language: Jupyter Notebook - Size: 42.5 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

SayamAlt/Cyberbullying-Classification-using-fine-tuned-DistilBERT

Successfully fine-tuned a pretrained DistilBERT transformer model that can classify social media text data into one of 4 cyberbullying labels i.e. ethnicity/race, gender/sexual, religion and not cyberbullying with a remarkable accuracy of 99%.

Language: Jupyter Notebook - Size: 7.24 MB - Last synced at: 2 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

jasoncobra3/Natural_Language_Processing

Natural Language Processing (NLP) is a captivating field at the intersection of computer science and linguistics. It enables machines to understand, interpret, and respond to human language in a way that is both meaningful and useful. From chatbots to sentiment analysis, NLP applications are transforming industries and enhancing user experiences.

Language: Jupyter Notebook - Size: 7.97 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

MS1034/document-classification-using-KNN

Documents classification using KNN Algorithm a graph based approach along with scrapped data

Language: Python - Size: 13.4 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

CSingh26/Project3-SentimentAnalysis

Sentiment Analysis using Machine Learning

Language: Python - Size: 769 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

pusztaipatrik/job-postings

Results of a Data analytics project at TH Wildau. Created with Orange data analytics tool, Data source: https://www.kaggle.com/datasets/PromptCloudHQ/us-jobs-on-monstercom

Size: 11.5 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

SayamAlt/English-to-Spanish-Language-Translation-using-Seq2Seq-and-Attention

Successfully established a Seq2Seq with attention model which can perform English to Spanish language translation up to an accuracy of almost 97%.

Language: Jupyter Notebook - Size: 1.18 MB - Last synced at: 2 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

SayamAlt/Symptoms-Disease-Text-Classification

Successfully developed a fine-tuned BERT transformer model which can accurately classify symptoms to their corresponding diseases upto an accuracy of 89%.

Language: Jupyter Notebook - Size: 860 KB - Last synced at: 2 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

faraz-wq/Echostop

A plagiarism detection service

Size: 0 Bytes - Last synced at: 11 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Theofilusarifin/Sentiment-Analysis-on-2019-Indonesia-Election

This project aims to analyze the sentiment of tweets related to the 2019 Indonesia Election. Sentiment analysis plays a crucial role in understanding public opinion and attitudes towards political events, providing valuable insights for decision-making and public discourse.

Language: Jupyter Notebook - Size: 1.45 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 1

Theofilusarifin/Text-Classification-for-Craigslist-Posts

This project aims to classify Craigslist posts into different categories based on their heading. It utilizes machine learning models to predict the category of a given heading within a selected city and section.

Language: Jupyter Notebook - Size: 49.9 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

LoyumM/Movie-recommendation

Recommend similar movies

Language: Jupyter Notebook - Size: 9.71 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

UmmeKulsumTumpa/NLP_Basic_Codes

This repository contains fundamental codes and examples for Natural Language Processing (NLP) tasks for beginners like me.

Language: Python - Size: 4.88 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

anangkur/hoax-news-detection-using-tfidf

this repository is the results of final project research to complete my education at Telkom University

Language: Java - Size: 19.9 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

imane-ayouni/News-feed-classification-using-LSTM

a stacked LSTM to categorize textual news feeds

Language: Jupyter Notebook - Size: 53.7 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

emrejilta/nlp-text-preprocessing

Text Preprocessing with NLTK and spaCy

Language: Jupyter Notebook - Size: 373 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

bilalhameed248/FAQ-Chat-Bot-Using-VertexAI

A generative AI-based FAQ Chat-Bot with a Flask Back-End, designed to operate within an organization's internal domain. - Jul 2023 - Oct 2023

Language: Jupyter Notebook - Size: 10.7 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

minseok0809/korean-sentence-segementation

AIHub 한국어 데이터 전처리: 한국어 문장 분리

Language: Jupyter Notebook - Size: 2.61 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

SayamAlt/E-Commerce-Text-Classification

Successfully established a machine learning model that can accurately classify an e-commerce product into one of four categories, namely "Books", "Clothing & Accessories", "Household" and "Electronics", based on the product's description.

Language: Jupyter Notebook - Size: 10.6 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

SpydazWebAI-NLP/Basic_Tokenizer2023

The Tokenizer is a versatile text processing library written in Visual Basic (VB.NET). It provides functionalities for tokenizing text into words, sentences, characters, and n-grams. The library is designed to be flexible, customizable, and easy to integrate into your VB.NET projects.

Language: Visual Basic .NET - Size: 1.06 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

ArpitaShrivas001/Sentiment-Analysis

Text pre processing and sentiment analysis on AIR BNB customer feedback dataset.

Language: Python - Size: 18.6 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

prathmesh444/WhatsApp-Chat-Analyzer

This webapp uses text preprocessing and Exploratory data analysis to present interesting insights and patterns of relationship between 2 or more individuals. Currently this website only handles english and hinglish text.

Language: Python - Size: 333 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

BetikuOluwatobi/clustering_analysis_on_spotify_million_songs

A clustering analysis on the Spotify Million Dataset with KMeans algorithm

Language: Jupyter Notebook - Size: 22.6 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

YashSDholam/Tripadvisor-Hotel-Review-Sentiment-Analysis-using-LSTM-Neural-Network

In this project, I utilized the TripAdvisor Hotel Review dataset from Kaggle to perform sentiment analysis on hotel reviews. The main objective was to build a predictive model using LSTM (Long Short-Term Memory) neural networks to classify hotel reviews as positive or negative based on their textual content.

Language: Jupyter Notebook - Size: 6.48 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

putuwaw/text-preprocessing

Text Preprocessing in Python

Language: Python - Size: 18.6 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Ayfred/MissionR-D

Language: Jupyter Notebook - Size: 2.93 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

MD-Ryhan/NLP-Preprocesing

This repository contains code for preprocessing natural language data for use in NLP applications.

Language: Jupyter Notebook - Size: 10.7 KB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

LeventSoykan/Movie_Recommendation_Using_NMF

Project to recommend movies using non-negative matrix factorization

Language: Jupyter Notebook - Size: 31.7 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

yarrap/Natural_Language_Processing

Language: Jupyter Notebook - Size: 481 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

saahilbhatia/text-preprocessing-resumes

Text preprocessing a set of resumes

Language: Jupyter Notebook - Size: 1.93 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

rohitgarud/NLP-data-preprocessing

An archive of data (text) preprocessing tools for NLP

Language: Jupyter Notebook - Size: 919 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

NehalGund/Product-Recommendation-System

Recommending similar product based on text features.

Language: Jupyter Notebook - Size: 6.44 MB - Last synced at: 21 days ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

shanuhalli/Assignment-Text-Mining

Perform sentimental analysis on the Elon-musk tweets and Extract reviews of any product from ecommerce website like amazon, Perform emotion mining.

Language: Jupyter Notebook - Size: 2.06 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

vaibhavhaswani/GoText

GoText is a universal text extraction and preprocessing tool for python which supportss wide variety of document formats.

Language: Python - Size: 66.4 KB - Last synced at: 9 months ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 1

SayamAlt/Quora-Duplicate-Question-Pairs-Identification

Successfully developed a machine learning model which can accurately detect whether any given pair of Quora questions are duplicate or not.

Language: Jupyter Notebook - Size: 1.88 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

SayamAlt/English-to-German-Translation-using-Seq2Seq

Successfully established a neural machine translation model using sequence to sequence modeling which can successfully translate English sentences to their corresponding German translations.

Language: Jupyter Notebook - Size: 626 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

schatzederwelt/toxic_comments_detection

Автоматическое выявление токсичных комментариев

Language: Jupyter Notebook - Size: 1.86 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Bonniface/Text-CLeaning-And-Classification

Text classification is a widely used natural language processing task in different business problems. Given a statement or document, the task involves assigning to it an appropriate category from a pre-defined set of categories. The dataset of choice determines the set of categories. Text classification has applications in emotion classification, n

Language: Jupyter Notebook - Size: 8.34 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0