An open API service providing repository metadata for many open source software ecosystems.

Topic: "html-extraction"

miso-belica/sumy

Module for automatic summarization of text documents and HTML pages.

Language: Python - Size: 1.57 MB - Last synced at: 6 months ago - Pushed at: 11 months ago - Stars: 3,518 - Forks: 530

bookieio/breadability

Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)

Language: HTML - Size: 604 KB - Last synced at: 13 days ago - Pushed at: 12 months ago - Stars: 204 - Forks: 25

html-extract/hext

Domain-specific language for extracting structured data from HTML documents

Language: C++ - Size: 2.12 MB - Last synced at: 7 days ago - Pushed at: 26 days ago - Stars: 52 - Forks: 3

Whomrx666/Xtract-html

Xtract-html is a tool for extracting HTML display code from a website, which you can also use for your website.

Language: Python - Size: 283 KB - Last synced at: 20 days ago - Pushed at: 2 months ago - Stars: 5 - Forks: 1

Whomrx666/Xtract-htmlV2

Xtract-htmlV2 is a tool for getting the HTML code from the website you want and is the successor to the previous version

Language: Python - Size: 459 KB - Last synced at: 20 days ago - Pushed at: 2 months ago - Stars: 4 - Forks: 0

zanachka/extruct Fork of scrapinghub/extruct

Extract embedded metadata from HTML markup

Language: Python - Size: 996 KB - Last synced at: 28 days ago - Pushed at: 29 days ago - Stars: 1 - Forks: 0

zanachka/dateparser Fork of scrapinghub/dateparser

python parser for human readable dates

Language: Python - Size: 5.12 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

zanachka/price-parser Fork of scrapinghub/price-parser

Extract price amount and currency symbol from a raw text string

Language: Python - Size: 121 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

zanachka/python-readability Fork of buriy/python-readability

fast python port of arc90's readability tool, updated to match latest readability.js!

Language: Python - Size: 660 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

zanachka/number-parser Fork of scrapinghub/number-parser

Parse numbers written in natural language

Language: Python - Size: 421 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

zanachka/article-extraction-benchmark Fork of scrapinghub/article-extraction-benchmark

Article extraction benchmark: dataset and evaluation scripts

Language: Python - Size: 11.8 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

zanachka/html-text Fork of TeamHG-Memex/html-text

Extract text from HTML

Language: HTML - Size: 128 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

zanachka/jusText Fork of miso-belica/jusText

Heuristic based boilerplate removal tool

Language: Python - Size: 1010 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0