Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
Package Usage: maven: com.digitalpebble.stormcrawler:storm-crawler-core
Storm-Crawler core Java API.
34 versions
Latest release: about 1 year ago
12 dependent packages
View more package details: https://packages.ecosyste.ms/registries/repo1.maven.org/packages/com.digitalpebble.stormcrawler:storm-crawler-core
View more repository details: https://repos.ecosyste.ms/hosts/GitHub/repositories/apache%2Fincubator-stormcrawler
Dependent Repos 14
lanl/trace-crawler
- 1.13 portals-crawler/pom.xml
Size: 140 MB - Last synced: about 1 month ago - Pushed: about 1 month ago
tokenmill/crawling-framework
Easily crawl news portals or blog sites using Storm Crawler.- crawler/pom.xml
- 1.5.1 pom.xml
Size: 918 KB - Last synced: 24 days ago - Pushed: over 1 year ago
clarin-eric/linkchecker
- 2.4 pom.xml
Size: 573 KB - Last synced: about 2 months ago - Pushed: about 2 months ago
apache/incubator-stormcrawler
A scalable, mature and versatile web crawler based on Apache Storm- \\\\\${version} archetype/src/main/resources/archetype-resources/pom.xml
- ${StormCrawlerVersion} external/elasticsearch/archetype/src/main/resources/archetype-resources/pom.xml
- 2.5-SNAPSHOT external/elasticsearch/pom.xml
- ${project.version} external/pom.xml
- 2.5-SNAPSHOT external/tika/pom.xml
- 2.5-SNAPSHOT external/urlfrontier/pom.xml
- 2.5-SNAPSHOT external/warc/pom.xml
- 2.5-SNAPSHOT external/warc/pom.xml
- ${StormCrawlerVersion} external/opensearch/archetype/src/main/resources/archetype-resources/pom.xml
- 2.8-SNAPSHOT external/opensearch/pom.xml
Size: 6.58 MB - Last synced: 11 days ago - Pushed: 11 days ago
kanwarkakkar/stormCrawler
Modified Storm-Crawler ES- \\\\\\${version} archetype/src/main/resources/archetype-resources/pom.xml
- 1.4-SNAPSHOT external/aws/pom.xml
- 1.4-SNAPSHOT external/elasticsearch/pom.xml
- ${project.version} external/langid/pom.xml
- 1.4-SNAPSHOT external/solr/pom.xml
- 1.4-SNAPSHOT external/sql/pom.xml
- 1.4-SNAPSHOT external/tika/pom.xml
- 1.4-SNAPSHOT external/tika/pom.xml
- 1.4-SNAPSHOT external/warc/pom.xml
Size: 2.48 MB - Last synced: over 1 year ago - Pushed: over 1 year ago
Tiago4k/QSE-Project
Stormcrawler connected to Elasticsearch- 1.15 es-stormcrawler/pom.xml
Size: 51.7 MB - Last synced: about 1 year ago - Pushed: over 1 year ago
sinhlt58/espresso_public
- 1.12 crawling/website/data-espresso/pom.xml
Size: 341 MB - Last synced: over 1 year ago - Pushed: over 1 year ago
aeranginkaman/langid-storm-crawler
detecting the language web pages based on their textual contents.- 1.16 pom.xml
Last synced: over 1 year ago
tokenmill/crawling-framework
Framework for crawling news articles- crawler/pom.xml
- 1.5.1 pom.xml
Last synced: over 1 year ago
dandycheung/storm-crawler Fork of DigitalPebble/storm-crawler
A scalable, mature and versatile web crawler based on Apache Storm- \\\\\${version} archetype/src/main/resources/archetype-resources/pom.xml
- ${StormCrawlerVersion} external/elasticsearch/archetype/src/main/resources/archetype-resources/pom.xml
- 2.7-SNAPSHOT external/elasticsearch/pom.xml
- ${project.version} external/pom.xml
- 2.7-SNAPSHOT external/tika/pom.xml
- 2.7-SNAPSHOT external/urlfrontier/pom.xml
- 2.7-SNAPSHOT external/warc/pom.xml
- 2.7-SNAPSHOT external/warc/pom.xml
Size: 7.9 MB - Last synced: about 1 year ago - Pushed: about 1 year ago
admariner/storm-crawler Fork of apache/incubator-stormcrawler
A scalable, mature and versatile web crawler based on Apache Storm- \\\\\${version} archetype/src/main/resources/archetype-resources/pom.xml
- ${StormCrawlerVersion} external/elasticsearch/archetype/src/main/resources/archetype-resources/pom.xml
- 2.6-SNAPSHOT external/elasticsearch/pom.xml
- ${project.version} external/pom.xml
- 2.6-SNAPSHOT external/tika/pom.xml
- 2.6-SNAPSHOT external/urlfrontier/pom.xml
- 2.6-SNAPSHOT external/warc/pom.xml
- 2.6-SNAPSHOT external/warc/pom.xml
Size: 8.15 MB - Last synced: 3 days ago - Pushed: 2 months ago
acdh-oeaw/stormychecker 📦
storm crawler adation for URL validation- 1.14 pom.xml
Size: 179 KB - Last synced: 11 months ago - Pushed: over 3 years ago
smallela1/eclkc-stormcrawl
- 1.8-SNAPSHOT archetype/src/main/resources/archetype-resources/pom.xml
- ${project.version} external/pom.xml
- 1.8-SNAPSHOT external/tika/pom.xml
- 1.8-SNAPSHOT external/warc/pom.xml
Size: 430 KB - Last synced: about 1 year ago - Pushed: over 5 years ago
HPI-BP2017N2/Crawler
Based on Stormcrawler to crawl a list of domains and hand the pages to a data store- 1.9 pom.xml
Size: 70.3 KB - Last synced: over 1 year ago - Pushed: almost 6 years ago
commoncrawl/news-crawl
News crawling with StormCrawler - stores content as WARC- 1.18 pom.xml
- 1.18 pom.xml
Size: 231 KB - Last synced: 24 days ago - Pushed: 6 months ago
luda171/trace-archiver
The trace crawler is a tool for selective web crawling to archive web resources with well-defined boundaries. The specific web navigation steps (or trace) are formulated for the families of webpages, where layout or HTML structure can be similar but the content is different, for example, GitHub, Slideshare, blogs, etc- 1.13 portals-crawler/pom.xml
Size: 142 MB - Last synced: about 1 year ago - Pushed: almost 2 years ago
ukwa/wren
Experiments in testable, scaleable crawler architectures- 1.5.1 wren/pom.xml
Size: 523 KB - Last synced: about 1 year ago - Pushed: almost 7 years ago
52North/ecmwf-dataset-crawl
- 1.10 crawler/pom.xml
- 1.9 crawler/pom.xml
Size: 22.1 MB - Last synced: about 2 months ago - Pushed: over 5 years ago
sebastian-nagel/warc-crawler
Process web archives (WARC format) with StormCrawler and index content into Elasticsearch or Solr- 2.2-SNAPSHOT pom.xml
Size: 44.9 KB - Last synced: over 1 year ago - Pushed: over 1 year ago
liinnux/storm-crawler Fork of DigitalPebble/storm-crawler
Web crawler SDK based on Apache Storm- \\\\\\${version} archetype/src/main/resources/archetype-resources/pom.xml
- 0.9-SNAPSHOT external/aws/pom.xml
- 0.9-SNAPSHOT external/elasticsearch/pom.xml
- 0.9-SNAPSHOT external/solr/pom.xml
- 0.9-SNAPSHOT external/sql/pom.xml
- 0.9-SNAPSHOT external/tika/pom.xml
Size: 3.46 MB - Last synced: over 1 year ago - Pushed: over 8 years ago
fysoft2006/storm-crawler Fork of DigitalPebble/storm-crawler
Web crawler SDK based on Apache Storm- \\\\\\${version} archetype/src/main/resources/archetype-resources/pom.xml
- 0.9-SNAPSHOT external/aws/pom.xml
- 0.9-SNAPSHOT external/elasticsearch/pom.xml
- 0.9-SNAPSHOT external/solr/pom.xml
- 0.9-SNAPSHOT external/sql/pom.xml
- 0.9-SNAPSHOT external/tika/pom.xml
Size: 3.46 MB - Last synced: 10 months ago - Pushed: over 8 years ago
jordillachmrf/stormcrawler
- 1.10 pom.xml
Size: 85 KB - Last synced: about 1 month ago - Pushed: almost 6 years ago
lviiii/storm-crawler Fork of DigitalPebble/storm-crawler
Web crawler SDK based on Apache Storm- \\\\\\${version} archetype/src/main/resources/archetype-resources/pom.xml
- 0.9-SNAPSHOT external/aws/pom.xml
- 0.9-SNAPSHOT external/elasticsearch/pom.xml
- 0.9-SNAPSHOT external/solr/pom.xml
- 0.9-SNAPSHOT external/sql/pom.xml
- 0.9-SNAPSHOT external/tika/pom.xml
Size: 3.49 MB - Last synced: over 1 year ago - Pushed: over 8 years ago
desp0916/LearnStormCrawler
Learning StormCrawler- 1.1.1 pom.xml
Size: 6.84 KB - Last synced: about 1 year ago - Pushed: over 7 years ago
szmer/ActualScan
smart Web search engine with infinitely sortable results- 2.0 java/generalcrawl/pom.xml
Size: 10 MB - Last synced: 10 months ago - Pushed: over 2 years ago
wdxxl/storm-crawler Fork of DigitalPebble/storm-crawler
Web crawler SDK based on Apache Storm- \\\\\\${version} archetype/src/main/resources/archetype-resources/pom.xml
- 0.9-SNAPSHOT external/aws/pom.xml
- 0.9-SNAPSHOT external/elasticsearch/pom.xml
- 0.9-SNAPSHOT external/solr/pom.xml
- 0.9-SNAPSHOT external/sql/pom.xml
- 0.9-SNAPSHOT external/tika/pom.xml
Size: 3.56 MB - Last synced: 10 months ago - Pushed: over 8 years ago
Kulanjith/User-Management
- 1.6 core/pom.xml
- 1.8 webapp/pom.xml
Size: 17.6 KB - Last synced: about 1 year ago - Pushed: about 3 years ago
wuzhongdehua/storm_crawler
- 0.10 pom.xml
Size: 10.7 KB - Last synced: about 2 months ago - Pushed: almost 8 years ago
giuseppebonaccorso/storm-crawler Fork of apache/incubator-stormcrawler
Web crawler SDK based on Apache Storm- \\\\\\${version} archetype/src/main/resources/archetype-resources/pom.xml
- 1.0-SNAPSHOT external/aws/pom.xml
- 1.0-SNAPSHOT external/elasticsearch/pom.xml
- 1.0-SNAPSHOT external/solr/pom.xml
- 1.0-SNAPSHOT external/sql/pom.xml
- 1.0-SNAPSHOT external/tika/pom.xml
Size: 4.33 MB - Last synced: about 2 months ago - Pushed: almost 8 years ago
diversepwmeasurement/storm-crawler Fork of DigitalPebble/storm-crawler
A scalable, mature and versatile web crawler based on Apache Storm- \\\\\${version} archetype/src/main/resources/archetype-resources/pom.xml
- ${StormCrawlerVersion} external/elasticsearch/archetype/src/main/resources/archetype-resources/pom.xml
- 2.9-SNAPSHOT external/elasticsearch/pom.xml
- ${StormCrawlerVersion} external/opensearch/archetype/src/main/resources/archetype-resources/pom.xml
Size: 4.1 MB - Last synced: 10 months ago - Pushed: 10 months ago