Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub / omar-elmaria / python_scrapy_airflow_pipeline
This repo contains a full-fledged Python-based script that scrapes a JavaScript-rendered website, cleans the data, and pushes the results to a cloud-based database. The workflow is orchestrated on Airflow to run automatically
Stars: 3
Forks: 0
Open Issues: 0
License: None
Language: Python
Repo Size: 179 KB
Dependencies: pending
Created: over 1 year ago
Updated: about 1 year ago
Last pushed: over 1 year ago
Last synced: about 1 year ago
Topics: airflow, anti-bot, data-mining, dynamic-websites, javascript-rendered-websites, proxy-api, proxy-scraper, python, scrapy, spiders, web-crawling, web-scraping