Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub / Aalaa4444 / Text_Processing-and-Unique_Word_Extraction_fromHTML
Extract text content from an HTML page, process it, and extract unique words from the processed text. This notebook utilizes various text processing techniques including cleaning, normalization, tokenization, lemmatization or stemming, and stop words removal.
Stars: 0
Forks: 0
Open Issues: 0
License: None
Language: Jupyter Notebook
Repo Size: 12.7 KB
Dependencies:
0
Created: about 1 month ago
Updated: about 1 month ago
Last pushed: about 1 month ago
Last synced: about 1 month ago
Topics: beautifulsoup, data-extraction, extract-html, lemmatization, requests, stemming, stopwords-removal, text-cleaning, text-extraction, text-lemmatization, text-normalization, text-processing, text-tokenization, tokenization, tokenizer
Files
No dependencies found