GitHub topics: amazon-emr
aws-samples/amazon-emr-with-delta-lake
Amazon EMR Notebook to show how to read from and write to Delta tables with Amazon EMR
Language: Jupyter Notebook - Size: 434 KB - Last synced at: about 17 hours ago - Pushed at: about 18 hours ago - Stars: 18 - Forks: 14

awslabs/amazon-emr-vscode-toolkit
A VS Code Extension to make it easier to manage and develop Spark jobs on EMR
Language: TypeScript - Size: 907 KB - Last synced at: 9 days ago - Pushed at: 2 months ago - Stars: 35 - Forks: 5

Ditectrev/Amazon-Web-Services-Certified-AWS-Certified-Machine-Learning-MLS-C01-Practice-Tests-Exams-Question
⛳️ PASS: Amazon Web Services Certified (AWS Certified) Machine Learning Specialty (MLS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.
Size: 6.88 MB - Last synced at: 19 days ago - Pushed at: 7 months ago - Stars: 59 - Forks: 38

dacort/demo-code
Bits of code I use during live demos
Language: Jupyter Notebook - Size: 774 KB - Last synced at: 21 days ago - Pushed at: 4 months ago - Stars: 31 - Forks: 24

Ditectrev/Amazon-Web-Services-Certified-AWS-Certified-Data-Analytics-DAS-C01-Practice-Tests-Exams-Question
⛳️ PASS: Amazon Web Services Certified (AWS Certified) Data Analytics Specialty (DAS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.
Size: 9.44 MB - Last synced at: 2 days ago - Pushed at: 5 months ago - Stars: 7 - Forks: 7

awslabs/amazon-emr-cli
A command-line interface for packaging, deploying, and running your EMR Serverless Spark jobs
Language: Python - Size: 150 KB - Last synced at: 9 days ago - Pushed at: 12 months ago - Stars: 41 - Forks: 14

dacort/modern-data-lake-storage-layers
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
Language: Jupyter Notebook - Size: 262 KB - Last synced at: 21 days ago - Pushed at: almost 3 years ago - Stars: 48 - Forks: 29

aws-samples/aws-dbs-refarch-datalake
Reference Architectures for Datalakes on AWS
Language: HTML - Size: 4.52 MB - Last synced at: 6 months ago - Pushed at: almost 5 years ago - Stars: 79 - Forks: 31

Rituraj0480/Quad-Squad
CMPT 732 Project: Our project revolves around a bike-sharing firm, and as analysts for that business, we will be using several big data tools to offer insights into various use cases, predicting their future profits and assisting them in expanding their business.
Language: Python - Size: 11.9 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

build-on-aws/ci-cd-serverless-spark
Sample CI/CD pipeline for using GitHub Actions with Amazon EMR Serverless Spark.
Language: Python - Size: 13.7 KB - Last synced at: 28 days ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 1

Lostefra/SparkTemplate
A simple Java-Scala mixed project template for Apache Spark
Language: Scala - Size: 115 KB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

robertgv/Data_Lake_in_AWS
Udacity Data Engineering Nanodegree Program
Language: Python - Size: 1.44 MB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

polakowo/yelp-3nf
3NF-normalize Yelp data on S3 with Spark and load it into Redshift - automate the whole thing with Apache Airflow
Language: Jupyter Notebook - Size: 1.82 MB - Last synced at: 12 months ago - Pushed at: over 5 years ago - Stars: 12 - Forks: 3

esakik/data-engineering-essentials
Samples related to data engineering, e.g. spark, embulk, airflow, etc.
Language: Python - Size: 413 KB - Last synced at: 21 days ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 1

Sampsonyu/Data_Lake_with_Spark
Data Lake with Spark
Language: Jupyter Notebook - Size: 6.63 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

Mohammed-siddiq/Page-Rank-In-Spark
Page rank implementation in SPARK to rank authors and venues based on their publications in the DBLP dataset.
Language: Scala - Size: 1.3 MB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 0

cmeb45/fuzzyjoin
Language: Java - Size: 2.81 MB - Last synced at: over 1 year ago - Pushed at: about 9 years ago - Stars: 0 - Forks: 0

jaceyca/Rankmaniac
Used Amazon's Elastic MapReduce to rank the top 20 nodes based on PageRank of graphs with over 100,000 nodes http://courses.cms.caltech.edu/cs144/homeworks/rankmaniac.pdf
Language: Python - Size: 16.9 MB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 1

snowplow/dataflow-runner
Run templatable playbooks of Hadoop/Spark/et al jobs on Amazon EMR
Language: Go - Size: 29 MB - Last synced at: 7 days ago - Pushed at: about 1 year ago - Stars: 19 - Forks: 8

garystafford/emr-demo
Project files for the post: Running PySpark Applications on Amazon EMR: Methods for Interacting with PySpark on Amazon Elastic MapReduce.
Language: Python - Size: 691 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 37 - Forks: 17

garystafford/aws-airflow-demo
Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for Apache Airflow (MWAA) on AWS.
Language: Python - Size: 753 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 39 - Forks: 14

WorksApplications/ansible_aws_emr 📦
Unofficial Ansible module for Amazon EMR
Language: Python - Size: 24.4 KB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 2

aws-samples/amazon-emr-yarn-capacity-scheduler
Configure Hadoop YARN CapacityScheduler on Amazon EMR on Amazon EC2 for multi-tenant heterogeneous workloads
Language: Shell - Size: 1.02 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Faisal-AlDhuwayhi/Data-Lake
Building Data Lake and ETL pipelines using Amazon EMR, S3, and Apache Spark
Language: Python - Size: 2.93 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

DeepHiveMind/Amazon-EMR-on-Amazon-EKS-Spark-job-with-AWS-Step-Functions
Orchestrate an Amazon EMR on Amazon EKS Spark job with AWS Step Functions
Size: 654 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 0

garystafford/emr-superset-demo
Project files for the post: Installing Apache Superset on Amazon EMR: Add data exploration and visualization to your analytics cluster.
Language: Python - Size: 31.3 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

tmusabbir/emr-with-custom-metrics
Amazon EMR Automatic Scaling using Custom Metrics
Language: Shell - Size: 1.73 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

aws-samples/amazon-s3-access-points-for-cross-account-integration-samples
This repo provides cross-account integration code samples using Amazon S3 Access points
Language: Java - Size: 172 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 1

cameres/emr-spark-jupyter
:notebook: Repository/Tutorial for initiallizing Jupyter Notebook and Spark cluster on Amazon EMR
Language: Python - Size: 17.6 KB - Last synced at: almost 2 years ago - Pushed at: over 8 years ago - Stars: 4 - Forks: 1

DarthVi/knn-ncc-spark
An implementation in Scala of kNN and NCC based on Spark
Language: Scala - Size: 3.18 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0
