GitHub / manishkolla / Multi-Threaded-Web-Crawler
This project is a multi-threaded web crawler implemented in Java that efficiently explores websites using Jsoup for HTML parsing and ExecutorService for concurrent URL processing. It supports depth control, manages crawled URLs, and ensures that the crawler can resume from a previous state using a persistent state file.
JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manishkolla%2FMulti-Threaded-Web-Crawler
Stars: 0
Forks: 0
Open issues: 0
License: None
Language: Java
Size: 9.9 MB
Dependencies parsed at: Pending
Created at: 5 months ago
Updated at: 2 months ago
Pushed at: 2 months ago
Last synced at: 2 months ago
Topics: concurrency, html-parsing, java, jsoup, multithreading, operating-system, url-management, webcrawler