An open API service providing repository metadata for many open source software ecosystems.

Topic: "storm-crawler"

commoncrawl/news-crawl

News crawling with StormCrawler - stores content as WARC

Language: Java - Size: 247 KB - Last synced at: 20 days ago - Pushed at: 3 months ago - Stars: 344 - Forks: 37

tokenmill/crawling-framework

Easily crawl news portals or blog sites using Storm Crawler.

Language: Java - Size: 918 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 21 - Forks: 4

tokenmill/crawling-framework-example

Demonstration on how to use the Crawling Framework to setup a simple science news crawler and store results in ElasticSearch. Use this configuration to set up your own crawler.

Language: Java - Size: 179 KB - Last synced at: 3 months ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 0