An open API service providing repository metadata for many open source software ecosystems.

Topic: "html-extractor"

miso-belica/sumy

Module for automatic summarization of text documents and HTML pages.

Language: Python - Size: 1.57 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 3,590 - Forks: 528

bookieio/breadability

Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)

Language: HTML - Size: 604 KB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 204 - Forks: 25

cdimascio/essence

Automatically extract the main text content (and more) from an HTML document

Language: Kotlin - Size: 1.93 MB - Last synced at: 2 months ago - Pushed at: almost 3 years ago - Stars: 117 - Forks: 16

cnyangkui/html-extractor

基于行块分布函数的通用网页正文抽取算法优化,Python实现

Language: Python - Size: 923 KB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 35 - Forks: 9

kwaziidev/textractor

从html中提取正文,用于新闻类网页

Language: Go - Size: 46.9 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 16 - Forks: 4

JanDC/css-from-html-extractor

PHP library which determines which css is used from html snippets.

Language: PHP - Size: 42 KB - Last synced at: 20 days ago - Pushed at: over 5 years ago - Stars: 9 - Forks: 2

Whomrx666/Xtract-html

Xtract-html is a tool for extracting HTML display code from a website, which you can also use for your website.

Language: Python - Size: 283 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 5 - Forks: 1

Whomrx666/Xtract-htmlV2

Xtract-htmlV2 is a tool for getting the HTML code from the website you want and is the successor to the previous version

Language: Python - Size: 459 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 4 - Forks: 0

importcjj/go-readability Fork of go-shiori/go-readability

Go package that cleans a HTML page for better readability.

Language: HTML - Size: 95.7 KB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

davidmillerpak/Media-Graper

Media Graper is a open source tool for Linux which is developed to extract all the Images, links, Videos from a Webpage.

Language: Shell - Size: 388 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

MorrisGlr/HEART

HTML‐to‐Anki Enhanced Human Explanation & Reasoning Tool (HEART). A Python CLI that leverages the OpenAI API to transform full UWorld vignettes into AI-enhanced Anki cards.

Language: Python - Size: 268 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0