An open API service providing repository metadata for many open source software ecosystems.

GitHub / NitinSPatil15 / Project-4-Data-Lake-with-AWS-EMR

An ETL pipeline that extracts data from S3, processes them using Spark, and loads the data back into S3 as a set of dimensional tables

JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NitinSPatil15%2FProject-4-Data-Lake-with-AWS-EMR

Stars: 2
Forks: 4
Open issues: 0

License: None
Language: Python
Size: 601 KB
Dependencies parsed at: Pending

Created at: almost 5 years ago
Updated at: about 1 year ago
Pushed at: almost 5 years ago
Last synced at: about 1 year ago

Topics: aws-emr, data-lak, etl-pipeline, pyspark, s3-bucket

    Loading...