Topic: "web-content-extractor"
cdimascio/essence
Automatically extract the main text content (and more) from an HTML document
Language: Kotlin - Size: 1.93 MB - Last synced at: 2 months ago - Pushed at: almost 3 years ago - Stars: 117 - Forks: 16

MohamedHmini/iww
AI based web-wrapper for web-content-extraction
Language: Python - Size: 59.2 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 100 - Forks: 14

mrjleo/boilernet
Boilerplate Removal using Deep Learning
Language: Python - Size: 36.1 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 47 - Forks: 8

SebangsaHQ/clip
URL content extractor using go language.
Language: Go - Size: 20.5 KB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 9 - Forks: 3

minarc/godensity
This repository is implematation of 📄 DOM based content extraction via text density. Tested for Korean web pages.
Language: Go - Size: 2.7 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 5 - Forks: 0

UncoveredTensor/websense
Content scraping from DOM's based on several NLP techniques
Size: 1.95 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

platonai/pulsar-auto-mining
Extract almost every fields from a set of webpages using machine learning method, unsupervised.
Language: HTML - Size: 3.47 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 2
