Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub / apache / incubator-stormcrawler
A scalable, mature and versatile web crawler based on Apache Storm
JSON API: https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fincubator-stormcrawler
Stars: 861
Forks: 256
Open Issues: 33
License: apache-2.0
Language: HTML
Repo Size: 6.58 MB
Dependencies:
80
Created: about 11 years ago
Updated: 7 days ago
Last pushed: 7 days ago
Last synced: 7 days ago
Commit Stats
Commits: 1579
Authors: 53
Mean commits per author: 29.79
Development Distribution Score: 0.163
More commit stats: https://commits.ecosyste.ms/hosts/GitHub/repositories/apache/incubator-stormcrawler
Topics: apache-storm, crawler, distributed, java, stormcrawler, web-crawler
Funding links: https://github.com/sponsors/DigitalPebble
Files
Dependencies
- com.amazonaws:aws-java-sdk-cloudsearch 1.12.243
- com.amazonaws:aws-java-sdk-s3 1.12.243
- com.digitalpebble.stormcrawler:storm-crawler-core ${project.version}
- junit:junit
- org.apache.logging.log4j:log4j-api ${log4j2.version}
- org.apache.logging.log4j:log4j-core ${log4j2.version}
- org.apache.logging.log4j:log4j-slf4j-impl ${log4j2.version}
- org.apache.storm:storm-client
- org.mockito:mockito-core ${mockito.version} test
- org.apache.storm:storm-client 2.4.0 provided
- com.digitalpebble.stormcrawler:storm-crawler-core \\\\\${version}
- com.digitalpebble.stormcrawler:storm-crawler-tika \\\\\${version}
- org.apache.storm:flux-core 2.4.0
- com.fasterxml.jackson.core:jackson-annotations ${jackson.version}
- com.fasterxml.jackson.core:jackson-core ${jackson.version}
- com.fasterxml.jackson.core:jackson-databind
- com.github.ben-manes.caffeine:caffeine 3.1.1
- com.github.crawler-commons:crawler-commons 1.3
- com.ibm.icu:icu4j 71.1
- com.rometools:rome 1.18.0
- com.squareup.okhttp3:okhttp 4.10.0
- com.squareup.okhttp3:okhttp-brotli 4.10.0
- commons-cli:commons-cli 1.5.0
- commons-lang:commons-lang 2.6
- junit:junit
- org.apache.httpcomponents:httpclient 4.5.13
- org.apache.logging.log4j:log4j-api ${log4j2.version}
- org.apache.logging.log4j:log4j-core ${log4j2.version}
- org.apache.logging.log4j:log4j-slf4j-impl ${log4j2.version}
- org.apache.storm:storm-client
- org.apache.tika:tika-core ${tika.version}
- org.jsoup:jsoup 1.15.2
- org.seleniumhq.selenium:selenium-remote-driver 4.2.2
- org.seleniumhq.selenium:selenium-support 4.2.2
- org.yaml:snakeyaml 1.30
- us.codecraft:xsoup 0.3.4
- xerces:xercesImpl 2.12.2
- com.github.tomakehurst:wiremock 2.27.2 test
- org.mockito:mockito-core ${mockito.version} test
- org.apache.storm:storm-client 2.4.0 provided
- com.digitalpebble.stormcrawler:storm-crawler-core ${StormCrawlerVersion}
- com.digitalpebble.stormcrawler:storm-crawler-elasticsearch ${StormCrawlerVersion}
- com.digitalpebble.stormcrawler:storm-crawler-tika ${StormCrawlerVersion}
- org.apache.storm:flux-core 2.4.0
- org.elasticsearch.client:elasticsearch-rest-client-sniffer 7.17.4
- org.elasticsearch.client:elasticsearch-rest-high-level-client 7.17.4
- com.digitalpebble.stormcrawler:storm-crawler-core 2.5-SNAPSHOT test
- org.testcontainers:elasticsearch ${testcontainer.version} test
- org.apache.solr:solr-solrj 8.11.1
- org.apache.tika:tika-parsers-standard-package ${tika.version}
- com.digitalpebble.stormcrawler:storm-crawler-core 2.5-SNAPSHOT test
- com.github.crawler-commons:urlfrontier-API 2.2
- com.digitalpebble.stormcrawler:storm-crawler-core 2.5-SNAPSHOT test
- org.testcontainers:testcontainers ${testcontainer.version} test
- com.digitalpebble.stormcrawler:storm-crawler-core 2.5-SNAPSHOT
- org.apache.storm:storm-hdfs ${storm-client.version}
- org.netpreserve:jwarc 0.18.1
- com.digitalpebble.stormcrawler:storm-crawler-core 2.5-SNAPSHOT test
- com.github.tomakehurst:wiremock 2.27.2 test
- org.jetbrains:annotations 23.0.0 compile
- org.apache.storm:storm-client 2.4.0 provided
- com.fasterxml.jackson.core:jackson-annotations 2.13.3
- com.fasterxml.jackson.core:jackson-databind 2.13.3
- junit:junit 4.13.2 test
- actions/cache v2.1.7 composite
- actions/checkout v2 composite
- actions/setup-java v2 composite
- org.apache.storm:storm-client 2.4.0 provided
- com.digitalpebble.stormcrawler:storm-crawler-core ${StormCrawlerVersion}
- com.digitalpebble.stormcrawler:storm-crawler-opensearch ${StormCrawlerVersion}
- com.digitalpebble.stormcrawler:storm-crawler-tika ${StormCrawlerVersion}
- commons-io:commons-io 2.11.0
- org.apache.storm:flux-core 2.4.0
- org.opensearch.client:opensearch-rest-high-level-client 2.4.1
- com.digitalpebble.stormcrawler:storm-crawler-core 2.8-SNAPSHOT test
- org.testcontainers:testcontainers 1.17.6 test
- actions/checkout v3 composite
- actions/setup-java v3 composite
- coverallsapp/github-action v2 composite