GitHub topics: amazon-emr

Repositories

aws-samples/amazon-emr-with-delta-lake

Amazon EMR Notebook to show how to read from and write to Delta tables with Amazon EMR

Language: Jupyter Notebook - Size: 434 KB - Last synced at: about 17 hours ago - Pushed at: about 18 hours ago - Stars: 18 - Forks: 14

awslabs/amazon-emr-vscode-toolkit

A VS Code Extension to make it easier to manage and develop Spark jobs on EMR

Language: TypeScript - Size: 907 KB - Last synced at: 9 days ago - Pushed at: 2 months ago - Stars: 35 - Forks: 5

Ditectrev/Amazon-Web-Services-Certified-AWS-Certified-Machine-Learning-MLS-C01-Practice-Tests-Exams-Question

⛳️ PASS: Amazon Web Services Certified (AWS Certified) Machine Learning Specialty (MLS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.

Size: 6.88 MB - Last synced at: 19 days ago - Pushed at: 7 months ago - Stars: 59 - Forks: 38

dacort/demo-code

Bits of code I use during live demos

Language: Jupyter Notebook - Size: 774 KB - Last synced at: 21 days ago - Pushed at: 4 months ago - Stars: 31 - Forks: 24

Ditectrev/Amazon-Web-Services-Certified-AWS-Certified-Data-Analytics-DAS-C01-Practice-Tests-Exams-Question

⛳️ PASS: Amazon Web Services Certified (AWS Certified) Data Analytics Specialty (DAS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.

Size: 9.44 MB - Last synced at: 2 days ago - Pushed at: 5 months ago - Stars: 7 - Forks: 7

awslabs/amazon-emr-cli

A command-line interface for packaging, deploying, and running your EMR Serverless Spark jobs

Language: Python - Size: 150 KB - Last synced at: 9 days ago - Pushed at: 12 months ago - Stars: 41 - Forks: 14

dacort/modern-data-lake-storage-layers

Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Language: Jupyter Notebook - Size: 262 KB - Last synced at: 21 days ago - Pushed at: almost 3 years ago - Stars: 48 - Forks: 29

aws-samples/aws-dbs-refarch-datalake

Reference Architectures for Datalakes on AWS

Language: HTML - Size: 4.52 MB - Last synced at: 6 months ago - Pushed at: almost 5 years ago - Stars: 79 - Forks: 31

CMPT 732 Project: Our project revolves around a bike-sharing firm, and as analysts for that business, we will be using several big data tools to offer insights into various use cases, predicting their future profits and assisting them in expanding their business.

Language: Python - Size: 11.9 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

build-on-aws/ci-cd-serverless-spark

Sample CI/CD pipeline for using GitHub Actions with Amazon EMR Serverless Spark.

Language: Python - Size: 13.7 KB - Last synced at: 28 days ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 1

Lostefra/SparkTemplate

A simple Java-Scala mixed project template for Apache Spark

Language: Scala - Size: 115 KB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

robertgv/Data_Lake_in_AWS

Udacity Data Engineering Nanodegree Program

Language: Python - Size: 1.44 MB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

polakowo/yelp-3nf

3NF-normalize Yelp data on S3 with Spark and load it into Redshift - automate the whole thing with Apache Airflow

Language: Jupyter Notebook - Size: 1.82 MB - Last synced at: 12 months ago - Pushed at: over 5 years ago - Stars: 12 - Forks: 3

esakik/data-engineering-essentials

Samples related to data engineering, e.g. spark, embulk, airflow, etc.

Language: Python - Size: 413 KB - Last synced at: 21 days ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 1

Sampsonyu/Data_Lake_with_Spark

Data Lake with Spark

Language: Jupyter Notebook - Size: 6.63 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

Mohammed-siddiq/Page-Rank-In-Spark

Page rank implementation in SPARK to rank authors and venues based on their publications in the DBLP dataset.

Language: Scala - Size: 1.3 MB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 0

cmeb45/fuzzyjoin

Language: Java - Size: 2.81 MB - Last synced at: over 1 year ago - Pushed at: about 9 years ago - Stars: 0 - Forks: 0

jaceyca/Rankmaniac

Used Amazon's Elastic MapReduce to rank the top 20 nodes based on PageRank of graphs with over 100,000 nodes http://courses.cms.caltech.edu/cs144/homeworks/rankmaniac.pdf

Language: Python - Size: 16.9 MB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 1

snowplow/dataflow-runner

Run templatable playbooks of Hadoop/Spark/et al jobs on Amazon EMR

Language: Go - Size: 29 MB - Last synced at: 7 days ago - Pushed at: about 1 year ago - Stars: 19 - Forks: 8

garystafford/emr-demo

Project files for the post: Running PySpark Applications on Amazon EMR: Methods for Interacting with PySpark on Amazon Elastic MapReduce.

Language: Python - Size: 691 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 37 - Forks: 17

garystafford/aws-airflow-demo

Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for Apache Airflow (MWAA) on AWS.

Language: Python - Size: 753 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 39 - Forks: 14

WorksApplications/ansible_aws_emr 📦

Unofficial Ansible module for Amazon EMR

Language: Python - Size: 24.4 KB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 2

aws-samples/amazon-emr-yarn-capacity-scheduler

Configure Hadoop YARN CapacityScheduler on Amazon EMR on Amazon EC2 for multi-tenant heterogeneous workloads

Language: Shell - Size: 1.02 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Faisal-AlDhuwayhi/Data-Lake

Building Data Lake and ETL pipelines using Amazon EMR, S3, and Apache Spark

Language: Python - Size: 2.93 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

DeepHiveMind/Amazon-EMR-on-Amazon-EKS-Spark-job-with-AWS-Step-Functions

Orchestrate an Amazon EMR on Amazon EKS Spark job with AWS Step Functions

Size: 654 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 0

garystafford/emr-superset-demo

Project files for the post: Installing Apache Superset on Amazon EMR: Add data exploration and visualization to your analytics cluster.

Language: Python - Size: 31.3 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

tmusabbir/emr-with-custom-metrics

Amazon EMR Automatic Scaling using Custom Metrics

Language: Shell - Size: 1.73 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

aws-samples/amazon-s3-access-points-for-cross-account-integration-samples

This repo provides cross-account integration code samples using Amazon S3 Access points

Language: Java - Size: 172 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 1

cameres/emr-spark-jupyter

:notebook: Repository/Tutorial for initiallizing Jupyter Notebook and Spark cluster on Amazon EMR

Language: Python - Size: 17.6 KB - Last synced at: almost 2 years ago - Pushed at: over 8 years ago - Stars: 4 - Forks: 1

DarthVi/knn-ncc-spark

An implementation in Scala of kNN and NCC based on Spark

Language: Scala - Size: 3.18 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

Related Keywords

amazon-emr 30 spark 12 aws 11 amazon-s3 8 apache-spark 7 pyspark 4 data-lake 3 sql 3 scala 3 emr-cluster 3 amazon-athena 3 sbt 2 python 2 amazon-cloudwatch 2 aws-emr 2 data-engineering 2 aws-cloudformation 2 airflow 2 jupyter-notebook 2 emr 2 machine-learning 2 aws-lambda 2 aws-glue 2 mapreduce 2 aws-certified 2 apache-airflow 2 amazon-ec2 2 map-reduce 1 string-matching 1 string-similarity 1 flink 1 pagerank-algorithm 1 dblp-dataset 1 nosql 1 spark-sql 1 python3 1 elt 1 protocol-buffers 1 mrjob 1 fluentd 1 embulk 1 digdag 1 cloud-dataproc 1 cloud-dataflow 1 apache-hadoop 1 apache-beam 1 s3-bucket 1 apache-avro 1 yelp-dataset 1 ncc 1 knn 1 tutorial 1 spark-clusters 1 jupyter 1 cluster 1 aws-cross-account-s3-integration 1 amazon-s3-access-points 1 cloudwatch 1 bigdata 1 amazon-web-services 1 superset 1 apache-superset 1 aws-step-functions 1 amazon-eks 1 etl-pipeline 1 cloud-computing 1 big-data-processing 1 big-data 1 fifo-scheduler 1 fair-scheduler 1 capacity-scheduler 1 apache-hadoop-yarn 1 emr-management 1 ansible-modules 1 pyspark-applications 1 amazon-mwaa 1 emr-demo 1 elastic-map-reduce 1 hadoop 1 golang-application 1 delta-lake 1 apache-iceberg 1 apache-hudi 1 emr-serverless 1 practice-test 1 practice-exams 1 practice-exam 1 hdfs 1 das-c01 1 aws-data-analytics 1 apache-kafka 1 amazon-rds 1 amazon-quicksight 1 amazon-aurora 1 live-demos 1 emr-notebooks 1 aws-cloudformation-templates 1 aws-athena 1 neural-network 1 mls-c01 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos