GitHub / jashshah-dev / AWS-Big-Data-Pipeline-orchestrated-with-Airflow
A robust data pipeline leveraging Amazon EMR and PySpark, orchestrated seamlessly with Apache Airflow for efficient batch processing
JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jashshah-dev%2FAWS-Big-Data-Pipeline-orchestrated-with-Airflow
PURL: pkg:github/jashshah-dev/AWS-Big-Data-Pipeline-orchestrated-with-Airflow
Stars: 0
Forks: 0
Open issues: 0
License: None
Language: Python
Size: 16.6 KB
Dependencies parsed at: Pending
Created at: over 1 year ago
Updated at: over 1 year ago
Pushed at: over 1 year ago
Last synced at: over 1 year ago
Topics: airflow-dags, amazon-s3, distributed-computing, emr-cluster, pyspark, snowflake, transient-cluster