An open API service providing repository metadata for many open source software ecosystems.

GitHub / airscholar / EMR-for-data-engineers

This project demonstrates the use of Amazon Elastic Map Reduce (EMR) for processing large datasets using Apache Spark. It includes a Spark script for ETL (Extract, Transform, Load) operations, AWS command line instructions for setting up and managing the EMR cluster, and a dataset for testing and demonstration purposes.

JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/airscholar%2FEMR-for-data-engineers
PURL: pkg:github/airscholar/EMR-for-data-engineers

Stars: 7
Forks: 8
Open issues: 0

License: None
Language: Python
Size: 512 KB
Dependencies parsed at: Pending

Created at: over 1 year ago
Updated at: 6 months ago
Pushed at: over 1 year ago
Last synced at: 4 months ago

Topics: apache-spark, aws, aws-s3, emr-cluster

    Loading...