Multi-Threaded-Web-Crawler

This project is a multi-threaded web crawler implemented in Java that efficiently explores websites using Jsoup for HTML parsing and ExecutorService for concurrent URL processing. It supports depth control, manages crawled URLs, and ensures that the crawler can resume from a previous state using a persistent state file.

JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manishkolla%2FMulti-Threaded-Web-Crawler

Stars: 0
Forks: 0
Open issues: 0

License: None
Language: Java
Size: 9.9 MB
Dependencies parsed at: Pending

Created at: 5 months ago
Updated at: 2 months ago
Pushed at: 2 months ago
Last synced at: 2 months ago

Topics: concurrency, html-parsing, java, jsoup, multithreading, operating-system, url-management, webcrawler

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos

GitHub / manishkolla / Multi-Threaded-Web-Crawler