Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

Package Usage: maven: com.digitalpebble.stormcrawler:storm-crawler-core

Storm-Crawler core Java API.
34 versions
Latest release: about 1 year ago
12 dependent packages

View more package details: https://packages.ecosyste.ms/registries/repo1.maven.org/packages/com.digitalpebble.stormcrawler:storm-crawler-core

View more repository details: https://repos.ecosyste.ms/hosts/GitHub/repositories/apache%2Fincubator-stormcrawler

Dependent Repos 14

lanl/trace-crawler
  • 1.13 portals-crawler/pom.xml

Size: 140 MB - Last synced: about 1 month ago - Pushed: about 1 month ago

tokenmill/crawling-framework
Easily crawl news portals or blog sites using Storm Crawler.
  • crawler/pom.xml
  • 1.5.1 pom.xml

Size: 918 KB - Last synced: 24 days ago - Pushed: over 1 year ago

clarin-eric/linkchecker
  • 2.4 pom.xml

Size: 573 KB - Last synced: about 2 months ago - Pushed: about 2 months ago

apache/incubator-stormcrawler
A scalable, mature and versatile web crawler based on Apache Storm
  • \\\\\${version} archetype/src/main/resources/archetype-resources/pom.xml
  • ${StormCrawlerVersion} external/elasticsearch/archetype/src/main/resources/archetype-resources/pom.xml
  • 2.5-SNAPSHOT external/elasticsearch/pom.xml
  • ${project.version} external/pom.xml
  • 2.5-SNAPSHOT external/tika/pom.xml
  • 2.5-SNAPSHOT external/urlfrontier/pom.xml
  • 2.5-SNAPSHOT external/warc/pom.xml
  • 2.5-SNAPSHOT external/warc/pom.xml
  • ${StormCrawlerVersion} external/opensearch/archetype/src/main/resources/archetype-resources/pom.xml
  • 2.8-SNAPSHOT external/opensearch/pom.xml

Size: 6.58 MB - Last synced: 11 days ago - Pushed: 11 days ago

kanwarkakkar/stormCrawler
Modified Storm-Crawler ES
  • \\\\\\${version} archetype/src/main/resources/archetype-resources/pom.xml
  • 1.4-SNAPSHOT external/aws/pom.xml
  • 1.4-SNAPSHOT external/elasticsearch/pom.xml
  • ${project.version} external/langid/pom.xml
  • 1.4-SNAPSHOT external/solr/pom.xml
  • 1.4-SNAPSHOT external/sql/pom.xml
  • 1.4-SNAPSHOT external/tika/pom.xml
  • 1.4-SNAPSHOT external/tika/pom.xml
  • 1.4-SNAPSHOT external/warc/pom.xml

Size: 2.48 MB - Last synced: over 1 year ago - Pushed: over 1 year ago

xakus22/apache_strom_ide_run
  • 2.1 pom.xml

Last synced: over 1 year ago

Tiago4k/QSE-Project
Stormcrawler connected to Elasticsearch
  • 1.15 es-stormcrawler/pom.xml

Size: 51.7 MB - Last synced: about 1 year ago - Pushed: over 1 year ago

sinhlt58/espresso_public
  • 1.12 crawling/website/data-espresso/pom.xml

Size: 341 MB - Last synced: over 1 year ago - Pushed: over 1 year ago

opensearch-dlr/opensearch-prototype
  • 2.1 prototype/crawler/pom.xml

Last synced: over 1 year ago

lukas75/onlinemediacrowdfundingprediction
  • 1.4 pom.xml

Size: 1.03 GB - Last synced: over 1 year ago

aeranginkaman/langid-storm-crawler
detecting the language web pages based on their textual contents.
  • 1.16 pom.xml

Last synced: over 1 year ago

tokenmill/crawling-framework
Framework for crawling news articles
  • crawler/pom.xml
  • 1.5.1 pom.xml

Last synced: over 1 year ago

dandycheung/storm-crawler Fork of DigitalPebble/storm-crawler
A scalable, mature and versatile web crawler based on Apache Storm
  • \\\\\${version} archetype/src/main/resources/archetype-resources/pom.xml
  • ${StormCrawlerVersion} external/elasticsearch/archetype/src/main/resources/archetype-resources/pom.xml
  • 2.7-SNAPSHOT external/elasticsearch/pom.xml
  • ${project.version} external/pom.xml
  • 2.7-SNAPSHOT external/tika/pom.xml
  • 2.7-SNAPSHOT external/urlfrontier/pom.xml
  • 2.7-SNAPSHOT external/warc/pom.xml
  • 2.7-SNAPSHOT external/warc/pom.xml

Size: 7.9 MB - Last synced: about 1 year ago - Pushed: about 1 year ago

admariner/storm-crawler Fork of apache/incubator-stormcrawler
A scalable, mature and versatile web crawler based on Apache Storm
  • \\\\\${version} archetype/src/main/resources/archetype-resources/pom.xml
  • ${StormCrawlerVersion} external/elasticsearch/archetype/src/main/resources/archetype-resources/pom.xml
  • 2.6-SNAPSHOT external/elasticsearch/pom.xml
  • ${project.version} external/pom.xml
  • 2.6-SNAPSHOT external/tika/pom.xml
  • 2.6-SNAPSHOT external/urlfrontier/pom.xml
  • 2.6-SNAPSHOT external/warc/pom.xml
  • 2.6-SNAPSHOT external/warc/pom.xml

Size: 8.15 MB - Last synced: 3 days ago - Pushed: 2 months ago

acdh-oeaw/stormychecker 📦
storm crawler adation for URL validation
  • 1.14 pom.xml

Size: 179 KB - Last synced: 11 months ago - Pushed: over 3 years ago

smallela1/eclkc-stormcrawl
  • 1.8-SNAPSHOT archetype/src/main/resources/archetype-resources/pom.xml
  • ${project.version} external/pom.xml
  • 1.8-SNAPSHOT external/tika/pom.xml
  • 1.8-SNAPSHOT external/warc/pom.xml

Size: 430 KB - Last synced: about 1 year ago - Pushed: over 5 years ago

HPI-BP2017N2/Crawler
Based on Stormcrawler to crawl a list of domains and hand the pages to a data store
  • 1.9 pom.xml

Size: 70.3 KB - Last synced: over 1 year ago - Pushed: almost 6 years ago

commoncrawl/news-crawl
News crawling with StormCrawler - stores content as WARC
  • 1.18 pom.xml
  • 1.18 pom.xml

Size: 231 KB - Last synced: 24 days ago - Pushed: 6 months ago

luda171/trace-archiver
The trace crawler is a tool for selective web crawling to archive web resources with well-defined boundaries. The specific web navigation steps (or trace) are formulated for the families of webpages, where layout or HTML structure can be similar but the content is different, for example, GitHub, Slideshare, blogs, etc
  • 1.13 portals-crawler/pom.xml

Size: 142 MB - Last synced: about 1 year ago - Pushed: almost 2 years ago

ukwa/wren
Experiments in testable, scaleable crawler architectures
  • 1.5.1 wren/pom.xml

Size: 523 KB - Last synced: about 1 year ago - Pushed: almost 7 years ago

52North/ecmwf-dataset-crawl
  • 1.10 crawler/pom.xml
  • 1.9 crawler/pom.xml

Size: 22.1 MB - Last synced: about 2 months ago - Pushed: over 5 years ago

sebastian-nagel/warc-crawler
Process web archives (WARC format) with StormCrawler and index content into Elasticsearch or Solr
  • 2.2-SNAPSHOT pom.xml

Size: 44.9 KB - Last synced: over 1 year ago - Pushed: over 1 year ago

liinnux/storm-crawler Fork of DigitalPebble/storm-crawler
Web crawler SDK based on Apache Storm
  • \\\\\\${version} archetype/src/main/resources/archetype-resources/pom.xml
  • 0.9-SNAPSHOT external/aws/pom.xml
  • 0.9-SNAPSHOT external/elasticsearch/pom.xml
  • 0.9-SNAPSHOT external/solr/pom.xml
  • 0.9-SNAPSHOT external/sql/pom.xml
  • 0.9-SNAPSHOT external/tika/pom.xml

Size: 3.46 MB - Last synced: over 1 year ago - Pushed: over 8 years ago

fysoft2006/storm-crawler Fork of DigitalPebble/storm-crawler
Web crawler SDK based on Apache Storm
  • \\\\\\${version} archetype/src/main/resources/archetype-resources/pom.xml
  • 0.9-SNAPSHOT external/aws/pom.xml
  • 0.9-SNAPSHOT external/elasticsearch/pom.xml
  • 0.9-SNAPSHOT external/solr/pom.xml
  • 0.9-SNAPSHOT external/sql/pom.xml
  • 0.9-SNAPSHOT external/tika/pom.xml

Size: 3.46 MB - Last synced: 10 months ago - Pushed: over 8 years ago

jordillachmrf/stormcrawler
  • 1.10 pom.xml

Size: 85 KB - Last synced: about 1 month ago - Pushed: almost 6 years ago

lviiii/storm-crawler Fork of DigitalPebble/storm-crawler
Web crawler SDK based on Apache Storm
  • \\\\\\${version} archetype/src/main/resources/archetype-resources/pom.xml
  • 0.9-SNAPSHOT external/aws/pom.xml
  • 0.9-SNAPSHOT external/elasticsearch/pom.xml
  • 0.9-SNAPSHOT external/solr/pom.xml
  • 0.9-SNAPSHOT external/sql/pom.xml
  • 0.9-SNAPSHOT external/tika/pom.xml

Size: 3.49 MB - Last synced: over 1 year ago - Pushed: over 8 years ago

desp0916/LearnStormCrawler
Learning StormCrawler
  • 1.1.1 pom.xml

Size: 6.84 KB - Last synced: about 1 year ago - Pushed: over 7 years ago

szmer/ActualScan
smart Web search engine with infinitely sortable results
  • 2.0 java/generalcrawl/pom.xml

Size: 10 MB - Last synced: 10 months ago - Pushed: over 2 years ago

wdxxl/storm-crawler Fork of DigitalPebble/storm-crawler
Web crawler SDK based on Apache Storm
  • \\\\\\${version} archetype/src/main/resources/archetype-resources/pom.xml
  • 0.9-SNAPSHOT external/aws/pom.xml
  • 0.9-SNAPSHOT external/elasticsearch/pom.xml
  • 0.9-SNAPSHOT external/solr/pom.xml
  • 0.9-SNAPSHOT external/sql/pom.xml
  • 0.9-SNAPSHOT external/tika/pom.xml

Size: 3.56 MB - Last synced: 10 months ago - Pushed: over 8 years ago

Kulanjith/User-Management
  • 1.6 core/pom.xml
  • 1.8 webapp/pom.xml

Size: 17.6 KB - Last synced: about 1 year ago - Pushed: about 3 years ago

wuzhongdehua/storm_crawler
  • 0.10 pom.xml

Size: 10.7 KB - Last synced: about 2 months ago - Pushed: almost 8 years ago

giuseppebonaccorso/storm-crawler Fork of apache/incubator-stormcrawler
Web crawler SDK based on Apache Storm
  • \\\\\\${version} archetype/src/main/resources/archetype-resources/pom.xml
  • 1.0-SNAPSHOT external/aws/pom.xml
  • 1.0-SNAPSHOT external/elasticsearch/pom.xml
  • 1.0-SNAPSHOT external/solr/pom.xml
  • 1.0-SNAPSHOT external/sql/pom.xml
  • 1.0-SNAPSHOT external/tika/pom.xml

Size: 4.33 MB - Last synced: about 2 months ago - Pushed: almost 8 years ago

diversepwmeasurement/storm-crawler Fork of DigitalPebble/storm-crawler
A scalable, mature and versatile web crawler based on Apache Storm
  • \\\\\${version} archetype/src/main/resources/archetype-resources/pom.xml
  • ${StormCrawlerVersion} external/elasticsearch/archetype/src/main/resources/archetype-resources/pom.xml
  • 2.9-SNAPSHOT external/elasticsearch/pom.xml
  • ${StormCrawlerVersion} external/opensearch/archetype/src/main/resources/archetype-resources/pom.xml

Size: 4.1 MB - Last synced: 10 months ago - Pushed: 10 months ago