An open API service providing repository metadata for many open source software ecosystems.

GitHub / Morgscode / desktop-webpage-text-crawler

This desktop GUI will index, format and create .txt files from the text content from webpages you request, so long as HTML or JSON is sent as a response. You can crawl sites as single pages, crawl all internal links on a page, or crawl all links within the page's <nav> tag(s). You can also decide to extract only page titles, the main text content, or all text content from the page. The crawler has some built-in basic error logging.

JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Morgscode%2Fdesktop-webpage-text-crawler

Stars: 4
Forks: 0
Open issues: 0

License: None
Language: Python
Size: 103 KB
Dependencies parsed at: Pending

Created at: over 4 years ago
Updated at: over 1 year ago
Pushed at: over 4 years ago
Last synced at: about 1 year ago

Topics: content-services, web-crawler, web-scraping-python

    Loading...