Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: html-extraction
bookieio/breadability
Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)
Language: HTML - Size: 604 KB - Last synced: 2 days ago - Pushed: about 1 month ago - Stars: 204 - Forks: 26
zanachka/extruct Fork of scrapinghub/extruct
Extract embedded metadata from HTML markup
Language: Python - Size: 984 KB - Last synced: 12 days ago - Pushed: 13 days ago - Stars: 1 - Forks: 0
miso-belica/sumy
Module for automatic summarization of text documents and HTML pages.
Language: Python - Size: 1.57 MB - Last synced: 27 days ago - Pushed: 27 days ago - Stars: 3,431 - Forks: 524
html-extract/hext
Domain-specific language for extracting structured data from HTML documents
Language: C++ - Size: 2.06 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 51 - Forks: 3
zanachka/dateparser Fork of scrapinghub/dateparser
python parser for human readable dates
Language: Python - Size: 5.03 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 1 - Forks: 0
zanachka/python-readability Fork of buriy/python-readability
fast python port of arc90's readability tool, updated to match latest readability.js!
Language: Python - Size: 640 KB - Last synced: 12 months ago - Pushed: 12 months ago - Stars: 0 - Forks: 0
zanachka/price-parser Fork of scrapinghub/price-parser
Extract price amount and currency symbol from a raw text string
Language: Python - Size: 93.8 KB - Last synced: over 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
zanachka/number-parser Fork of scrapinghub/number-parser
Parse numbers written in natural language
Language: Python - Size: 401 KB - Last synced: over 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
zanachka/article-extraction-benchmark Fork of scrapinghub/article-extraction-benchmark
Article extraction benchmark: dataset and evaluation scripts
Language: Python - Size: 11.8 MB - Last synced: over 1 year ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0
zanachka/html-text Fork of TeamHG-Memex/html-text
Extract text from HTML
Language: HTML - Size: 128 KB - Last synced: over 1 year ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0
zanachka/jusText Fork of miso-belica/jusText
Heuristic based boilerplate removal tool
Language: Python - Size: 1010 KB - Last synced: over 1 year ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0