Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: html-extraction

bookieio/breadability

Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)

Language: HTML - Size: 604 KB - Last synced: 2 days ago - Pushed: about 1 month ago - Stars: 204 - Forks: 26

zanachka/extruct Fork of scrapinghub/extruct

Extract embedded metadata from HTML markup

Language: Python - Size: 984 KB - Last synced: 12 days ago - Pushed: 13 days ago - Stars: 1 - Forks: 0

miso-belica/sumy

Module for automatic summarization of text documents and HTML pages.

Language: Python - Size: 1.57 MB - Last synced: 27 days ago - Pushed: 27 days ago - Stars: 3,431 - Forks: 524

html-extract/hext

Domain-specific language for extracting structured data from HTML documents

Language: C++ - Size: 2.06 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 51 - Forks: 3

zanachka/dateparser Fork of scrapinghub/dateparser

python parser for human readable dates

Language: Python - Size: 5.03 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 1 - Forks: 0

zanachka/python-readability Fork of buriy/python-readability

fast python port of arc90's readability tool, updated to match latest readability.js!

Language: Python - Size: 640 KB - Last synced: 12 months ago - Pushed: 12 months ago - Stars: 0 - Forks: 0

zanachka/price-parser Fork of scrapinghub/price-parser

Extract price amount and currency symbol from a raw text string

Language: Python - Size: 93.8 KB - Last synced: over 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

zanachka/number-parser Fork of scrapinghub/number-parser

Parse numbers written in natural language

Language: Python - Size: 401 KB - Last synced: over 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

zanachka/article-extraction-benchmark Fork of scrapinghub/article-extraction-benchmark

Article extraction benchmark: dataset and evaluation scripts

Language: Python - Size: 11.8 MB - Last synced: over 1 year ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0

zanachka/html-text Fork of TeamHG-Memex/html-text

Extract text from HTML

Language: HTML - Size: 128 KB - Last synced: over 1 year ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0

zanachka/jusText Fork of miso-belica/jusText

Heuristic based boilerplate removal tool

Language: Python - Size: 1010 KB - Last synced: over 1 year ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0