Topic: "html-extractor"
miso-belica/sumy
Module for automatic summarization of text documents and HTML pages.
Language: Python - Size: 1.57 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 3,590 - Forks: 528

bookieio/breadability
Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)
Language: HTML - Size: 604 KB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 204 - Forks: 25

cdimascio/essence
Automatically extract the main text content (and more) from an HTML document
Language: Kotlin - Size: 1.93 MB - Last synced at: 2 months ago - Pushed at: almost 3 years ago - Stars: 117 - Forks: 16

cnyangkui/html-extractor
基于行块分布函数的通用网页正文抽取算法优化,Python实现
Language: Python - Size: 923 KB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 35 - Forks: 9

kwaziidev/textractor
从html中提取正文,用于新闻类网页
Language: Go - Size: 46.9 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 16 - Forks: 4

JanDC/css-from-html-extractor
PHP library which determines which css is used from html snippets.
Language: PHP - Size: 42 KB - Last synced at: 20 days ago - Pushed at: over 5 years ago - Stars: 9 - Forks: 2

Whomrx666/Xtract-html
Xtract-html is a tool for extracting HTML display code from a website, which you can also use for your website.
Language: Python - Size: 283 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 5 - Forks: 1

Whomrx666/Xtract-htmlV2
Xtract-htmlV2 is a tool for getting the HTML code from the website you want and is the successor to the previous version
Language: Python - Size: 459 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 4 - Forks: 0

importcjj/go-readability Fork of go-shiori/go-readability
Go package that cleans a HTML page for better readability.
Language: HTML - Size: 95.7 KB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

davidmillerpak/Media-Graper
Media Graper is a open source tool for Linux which is developed to extract all the Images, links, Videos from a Webpage.
Language: Shell - Size: 388 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

MorrisGlr/HEART
HTML‐to‐Anki Enhanced Human Explanation & Reasoning Tool (HEART). A Python CLI that leverages the OpenAI API to transform full UWorld vignettes into AI-enhanced Anki cards.
Language: Python - Size: 268 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0
