Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub / omar-elmaria / python_scrapy_airflow_pipeline

This repo contains a full-fledged Python-based script that scrapes a JavaScript-rendered website, cleans the data, and pushes the results to a cloud-based database. The workflow is orchestrated on Airflow to run automatically

JSON API: https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/omar-elmaria%2Fpython_scrapy_airflow_pipeline

Stars: 3
Forks: 0
Open Issues: 0

License: None
Language: Python
Repo Size: 179 KB
Dependencies: pending

Created: over 1 year ago
Updated: about 1 year ago
Last pushed: over 1 year ago
Last synced: about 1 year ago

Topics: airflow, anti-bot, data-mining, dynamic-websites, javascript-rendered-websites, proxy-api, proxy-scraper, python, scrapy, spiders, web-crawling, web-scraping

Files
    Loading...
    Readme
    Loading...