An open API service providing repository metadata for many open source software ecosystems.

GitHub / gustschaefer / Twitter-Batch-ETL

ETL que extraí diariamente os trending topics do Twitter em diversos países, realiza transformações com PySpark e envia os dados para o Amazon S3 utilizando Apache Airflow como orquestrador.

JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gustschaefer%2FTwitter-Batch-ETL
PURL: pkg:github/gustschaefer/Twitter-Batch-ETL

Stars: 4
Forks: 0
Open issues: 0

License: apache-2.0
Language: Python
Size: 1.03 MB
Dependencies parsed at: Pending

Created at: over 4 years ago
Updated at: 9 months ago
Pushed at: over 4 years ago
Last synced at: 9 months ago

Topics: airflow-docker, amazon-s3, etl-pipeline, parquet, twitter-api

    Loading...