An open API service providing repository metadata for many open source software ecosystems.

Topic: "web-content-extractor"

cdimascio/essence

Automatically extract the main text content (and more) from an HTML document

Language: Kotlin - Size: 1.93 MB - Last synced at: 2 months ago - Pushed at: almost 3 years ago - Stars: 117 - Forks: 16

MohamedHmini/iww

AI based web-wrapper for web-content-extraction

Language: Python - Size: 59.2 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 100 - Forks: 14

mrjleo/boilernet

Boilerplate Removal using Deep Learning

Language: Python - Size: 36.1 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 47 - Forks: 8

SebangsaHQ/clip

URL content extractor using go language.

Language: Go - Size: 20.5 KB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 9 - Forks: 3

minarc/godensity

This repository is implematation of 📄 DOM based content extraction via text density. Tested for Korean web pages.

Language: Go - Size: 2.7 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 5 - Forks: 0

UncoveredTensor/websense

Content scraping from DOM's based on several NLP techniques

Size: 1.95 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

platonai/pulsar-auto-mining

Extract almost every fields from a set of webpages using machine learning method, unsupervised.

Language: HTML - Size: 3.47 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 2