Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: stormcrawler

apache/incubator-stormcrawler

A scalable, mature and versatile web crawler based on Apache Storm

Language: HTML - Size: 6.41 MB - Last synced: 27 days ago - Pushed: about 1 month ago - Stars: 856 - Forks: 252

DigitalPebble/stormcrawler-docker

Resources for running StormCrawler with Docker services

Language: Dockerfile - Size: 16.6 KB - Last synced: 14 days ago - Pushed: 4 months ago - Stars: 7 - Forks: 2

DigitalPebble/benchmark

StormCrawler topology to evaluate the performance of different backends and configurations

Language: Shell - Size: 43 KB - Last synced: about 2 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0

DigitalPebble/ansible-storm

Ansible playbook for deploying a Storm cluster

Size: 27.3 KB - Last synced: about 2 months ago - Pushed: 5 months ago - Stars: 7 - Forks: 1

ngramp/stormcrawlnlp

Language: Java - Size: 35.6 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0

sebastian-nagel/warc-crawler

Process web archives (WARC format) with StormCrawler and index content into Elasticsearch or Solr

Language: FLUX - Size: 44.9 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 6 - Forks: 1