Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub / Aalaa4444 / Text_Processing-and-Unique_Word_Extraction_fromHTML

Extract text content from an HTML page, process it, and extract unique words from the processed text. This notebook utilizes various text processing techniques including cleaning, normalization, tokenization, lemmatization or stemming, and stop words removal.

JSON API: https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Aalaa4444%2FText_Processing-and-Unique_Word_Extraction_fromHTML

Stars: 0
Forks: 0
Open Issues: 0

License: None
Language: Jupyter Notebook
Repo Size: 12.7 KB
Dependencies: 0

Created: about 1 month ago
Updated: about 1 month ago
Last pushed: about 1 month ago
Last synced: about 1 month ago

Topics: beautifulsoup, data-extraction, extract-html, lemmatization, requests, stemming, stopwords-removal, text-cleaning, text-extraction, text-lemmatization, text-normalization, text-processing, text-tokenization, tokenization, tokenizer

Files
    Loading...
    Readme
    Loading...

    No dependencies found