GitHub / aksh-patel1 / parallel-web-scraper-on-cloud
This project demonstrates an event-driven architecture for parallel web scraping and processing tasks using AWS services. The scraper job, running on AWS Batch, collects data from multiple web pages simultaneously and stores it in S3. The processing job, triggered by AWS EventBridge, efficiently processes the scraped data and updates Google-Sheet.
JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aksh-patel1%2Fparallel-web-scraper-on-cloud
PURL: pkg:github/aksh-patel1/parallel-web-scraper-on-cloud
Stars: 2
Forks: 0
Open issues: 0
License: None
Language: Python
Size: 9.77 KB
Dependencies parsed at: Pending
Created at: about 1 year ago
Updated at: 4 months ago
Pushed at: about 1 year ago
Last synced at: 6 days ago
Topics: aws, aws-batch, aws-ecr, aws-eventbridge, aws-s3, data-preprocessing, docker, event-driven-architecture, eventdrivenarchitecture, python, web-scraping