An open API service providing repository metadata for many open source software ecosystems.

GitHub / itsSwapnil / pyspark-incremental-airflow

This repository contains an Airflow DAG that orchestrates an incremental data pipeline using PySpark scripts. The pipeline automates daily processing data, syncs results to S3, performs housekeeping, and loops until a target date threshold is reached.

JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itsSwapnil%2Fpyspark-incremental-airflow
PURL: pkg:github/itsSwapnil/pyspark-incremental-airflow

Stars: 0
Forks: 0
Open issues: 0

License: None
Language: Python
Size: 13.7 KB
Dependencies parsed at: Pending

Created at: about 2 months ago
Updated at: 20 days ago
Pushed at: 20 days ago
Last synced at: 20 days ago

Topics: airflow, data-engineer, elasticsearch, etl, hadoop, pyspark, spark

    Loading...