Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub / tokenmill / crawling-framework
Easily crawl news portals or blog sites using Storm Crawler.
JSON API: https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tokenmill%2Fcrawling-framework
Stars: 21
Forks: 3
Open Issues: 21
License: other
Language: Java
Repo Size: 918 KB
Dependencies:
87
Created: over 6 years ago
Updated: 8 months ago
Last pushed: over 1 year ago
Last synced: 8 days ago
Topics: crawler, crawling, crawling-framework, elasticsearch, java, scraping, storm, storm-crawler, vaadin
Files
Loading...
Readme
Loading...
Dependencies
- com.byteowls:vaadin-chartjs 1.0.0
- com.opencsv:opencsv 3.7
- com.vaadin:vaadin-client-compiled
- com.vaadin:vaadin-server
- com.vaadin:vaadin-themes
- javax.servlet:javax.servlet-api 3.1.0
- lt.tokenmill.crawling:page-analyzer
- lt.tokenmill.crawling:ui-commons
- org.apache.logging.log4j:log4j-core 2.13.2
- org.eclipse.jetty:jetty-continuation ${jetty.version}
- org.eclipse.jetty:jetty-webapp ${jetty.version}
- org.slf4j:slf4j-log4j12 ${slf4j.version}
- junit:junit 4.13.1 test
analysis-ui/pom.xml
maven
- com.vaadin:vaadin-client-compiled
- com.vaadin:vaadin-server
- com.vaadin:vaadin-themes
- javax.servlet:javax.servlet-api 3.1.0
- lt.tokenmill.crawling:ui-commons
- org.apache.logging.log4j:log4j-core 2.13.2
- org.eclipse.jetty:jetty-continuation ${jetty.version}
- org.eclipse.jetty:jetty-webapp ${jetty.version}
- org.slf4j:slf4j-log4j12 ${slf4j.version}
crawler/pom.xml
maven
- org.apache.logging.log4j:log4j-api 2.7 provided
- org.apache.logging.log4j:log4j-core 2.13.2 provided
- org.apache.storm:storm-core provided
- com.digitalpebble.stormcrawler:storm-crawler-core
- com.fasterxml.jackson.core:jackson-databind 2.10.0
- lt.tokenmill.crawling:data-model
- lt.tokenmill.crawling:elasticsearch
- lt.tokenmill.crawling:parser
- junit:junit 4.13.1 test
data-model/pom.xml
maven
- com.google.guava:guava
- joda-time:joda-time
- junit:junit 4.13.1 test
elasticsearch/pom.xml
maven
- org.apache.logging.log4j:log4j-api 2.7 provided
- org.apache.logging.log4j:log4j-core 2.13.2 provided
- org.slf4j:slf4j-log4j12 ${slf4j.version} provided
- com.google.guava:guava
- lt.tokenmill.crawling:data-model
- org.apache.httpcomponents:httpasyncclient 4.1.3
- org.apache.httpcomponents:httpclient 4.5.4
- org.apache.httpcomponents:httpcore 4.4.6
- org.apache.httpcomponents:httpcore-nio 4.4.6
- org.elasticsearch.client:elasticsearch-rest-client ${elasticsearch.version}
- org.elasticsearch.client:elasticsearch-rest-high-level-client ${elasticsearch.version}
- org.elasticsearch.client:transport
- org.elasticsearch:elasticsearch
- junit:junit 4.13.1 test
- org.elasticsearch.plugin:transport-netty4-client ${elasticsearch.version} test
page-analyzer/pom.xml
maven
- org.slf4j:slf4j-log4j12 ${slf4j.version} provided
- com.github.crawler-commons:crawler-commons 0.7
- com.google.guava:guava
- com.mashape.unirest:unirest-java 1.4.9
- lt.tokenmill.crawling:data-model ${project.version}
- org.jsoup:jsoup
- junit:junit 4.13.1 test
parser/pom.xml
maven
- org.slf4j:slf4j-log4j12 ${slf4j.version} provided
- com.github.jsonld-java:jsonld-java
- com.google.guava:guava
- lt.tokenmill.crawling:data-model ${project.version}
- lt.tokenmill:timewords ${timewords.version}
- org.apache.commons:commons-lang3 3.5
- org.clojure:clojure 1.7.0
- org.jsoup:jsoup
- junit:junit 4.13.1 test
pom.xml
maven
- com.vaadin:vaadin-bom 7.7.24 import
- com.digitalpebble.stormcrawler:storm-crawler-core 1.5.1
- com.github.jsonld-java:jsonld-java 0.11.0
- com.google.guava:guava 19.0
- joda-time:joda-time 2.9.4
- lt.tokenmill.crawling:data-model 0.3.4-SNAPSHOT
- lt.tokenmill.crawling:elasticsearch 0.3.4-SNAPSHOT
- lt.tokenmill.crawling:page-analyzer 0.3.4-SNAPSHOT
- lt.tokenmill.crawling:parser 0.3.4-SNAPSHOT
- lt.tokenmill.crawling:ui-commons 0.3.4-SNAPSHOT
- org.apache.storm:storm-core 1.1.3
- org.elasticsearch.client:transport 7.11.2
- org.elasticsearch:elasticsearch 7.11.2
- org.jsoup:jsoup 1.10.3
docker-compose.dev.yml
docker