GitHub topics: emr-cluster | Ecosyste.ms: Repos

dacort/demo-code

Bits of code I use during live demos

Language: Jupyter Notebook - Size: 774 KB - Last synced at: 3 days ago - Pushed at: 9 months ago - Stars: 30 - Forks: 24

Data-Bishop/Team5-BuildItAll-Data-Platform

This repository contains the codebase for the BuildItAll Big Data Processing Platform, a case study project designed to manage large daily data for a hypothetical Belgian client.

Language: HCL - Size: 180 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 5 - Forks: 0

cloudposse/terraform-aws-emr-cluster

Terraform module to provision an Elastic MapReduce (EMR) cluster on AWS

Language: HCL - Size: 4.06 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 74 - Forks: 82

san089/goodreads_etl_pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Language: Python - Size: 1.31 MB - Last synced at: 4 months ago - Pushed at: over 5 years ago - Stars: 1,378 - Forks: 227

HIRE FIT is a Big Data and Machine Learning-powered platform that automates resume screening and predicts candidate-job fit using Hadoop, Hive, Amazon S3, AWS SageMaker, and an XGBoost model trained on skill-based binary vectors. Built for efficient hiring at scale.

Language: Jupyter Notebook - Size: 93.9 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

RubensZimbres/Repo-2019

BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics

Language: Jupyter Notebook - Size: 57.8 MB - Last synced at: about 2 months ago - Pushed at: about 4 years ago - Stars: 139 - Forks: 73

Wittline/pyspark-on-aws-emr

The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.

Language: Python - Size: 3.61 MB - Last synced at: 5 months ago - Pushed at: about 3 years ago - Stars: 27 - Forks: 13

berksudan/Loan-Data-Report-with-AWS

Built a distributed system which completes several objectives with given data to generate loan reports using Amazon Web Services, Apache Spark, Java and Python.

Language: Java - Size: 3.67 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 1

airscholar/EMR-for-data-engineers

This project demonstrates the use of Amazon Elastic Map Reduce (EMR) for processing large datasets using Apache Spark. It includes a Spark script for ETL (Extract, Transform, Load) operations, AWS command line instructions for setting up and managing the EMR cluster, and a dataset for testing and demonstration purposes.

Language: Python - Size: 512 KB - Last synced at: 5 months ago - Pushed at: almost 2 years ago - Stars: 7 - Forks: 8

jfir/DataInsights

My Consulting Services

Language: HTML - Size: 2.1 MB - Last synced at: 5 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

desininja/Food-Delivery-RealTime-Data-Analysis

ETL Pipeline in AWS for Real Time Data Analysis

Language: Python - Size: 1.56 MB - Last synced at: 7 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

aws-samples/aws-dbs-refarch-datalake

Reference Architectures for Datalakes on AWS

Language: HTML - Size: 4.52 MB - Last synced at: 3 months ago - Pushed at: over 5 years ago - Stars: 79 - Forks: 31

kevinndungu-source/Amazon_EMR_Project_Resources

Explore and replicate Amazon EMR (Elastic MapReduce) setup and utilization for big data processing and analytics tasks, featuring comprehensive demonstrations from VPC creation to Spark job execution.

Language: Jupyter Notebook - Size: 561 KB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

tawounfouet/data-scientist-ocr-x-centralsupelec

Experience with time-series analysis and forecasting models, large data sets, model development and visualisation, statistics.

Language: Jupyter Notebook - Size: 156 MB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

longNguyen010203/Spark-Processing-AWS

👷🌇 Set up and build a big data processing pipeline with Apache Spark, 📦 AWS services (S3, EMR, EC2, IAM, VPC, Redshift) Terraform to setup the infrastructure and Integration Airflow to automate workflows🥊

Language: Python - Size: 1010 KB - Last synced at: 6 months ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

camposvinicius/aws-etl

This is an ETL application on AWS with general open sales and customer data that you can find here: https://github.com/camposvinicius/data/blob/main/AdventureWorks.zip, it's a zipped file with some .csvs inside that we will apply transformations.

Language: Smarty - Size: 168 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 17 - Forks: 3

rupeshtiwari/learning-apache-spark

apache spark

Language: Jupyter Notebook - Size: 41 KB - Last synced at: 9 days ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 1

matbragan/emr-airflow

Developing a Flow with EMR and Airflow

Language: Python - Size: 33.2 KB - Last synced at: 6 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Siddhesh19991/Automate_EMR_ETL_pipeline_using_Airflow

This project provides a detailed overview of creating an automated data engineering pipeline using Airflow, AWS services, Spark, Snowflake and Tableau

Language: Python - Size: 14.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

sowrabh-m/Data_Processing_using_Spark_Flink

This project demonstrates data cleaning, processing with Apache Spark and Apache Flink, both locally and on AWS EMR.

Language: Python - Size: 1.46 MB - Last synced at: 6 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

xianwill/spark-boilerplate

A boilerplate for spark projects with docker support for local development and scripts for emr support.

Language: Scala - Size: 30.3 KB - Last synced at: 5 days ago - Pushed at: almost 8 years ago - Stars: 9 - Forks: 4

choang94/yelp-reviews

Loading Yelp Reviews Data from Kaggle to a Spark Cluster provisioned on AWS EMR and doing analyses

Language: Jupyter Notebook - Size: 1.85 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

cloudposse-archives/terraform-aws-spotinst-mrscaler

Terraform module to provision an Elastic MapReduce (EMR) cluster on AWS using a Spotinst AWS MrScaler resource

Size: 54.7 KB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

m1theus/aws-emr-terraform

Example for provisioning AWS EMR service with Terraform

Language: HCL - Size: 4.88 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 1

fermat01/ETL-Data-Pipeline-using-AWS-EMR-Spark-Glue-Athena

Etl data pipeline using aws services

Language: Python - Size: 4.07 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

bbstilson/emr-cluster-manager

Half-baked implementation of a cluster manager for EMR.

Language: Scala - Size: 22.5 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

yennanliu/spark_emr_dev

Collection of code for submitting Spark/Hadoop/Hive/Pig tasks to EMR (AWS Elastic MapReduce) | #DE

Language: Scala - Size: 3.72 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 1

BiGHeaDMaX/Traitement-Big-Data-avec-Spark

Ce projet a pour but de réaliser un traitement sur des données volumineuses à l'aide de Spark dans le cloud.

Language: Jupyter Notebook - Size: 2.64 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

adnanrahin/spark-rdd-df-comparison-emr

Language: Scala - Size: 22.5 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

nileshsingal/PUBG-DATA-ANALYSIS

Player Unknown's Battlegrounds (PUBG), is a first person shooter game where the goal is to be the last player standing. You are placed on a giant circular map that shrinks as the game goes on, and you must find weapons, armor, and other supplies in order to kill other players / teams and survive.

Language: Python - Size: 128 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

maelfabien/Cassandra-GDELT-Queries

A Cassandra Architecture for GDELT Database 🌍

Language: Shell - Size: 52.5 MB - Last synced at: 6 months ago - Pushed at: over 6 years ago - Stars: 11 - Forks: 4

dhiraa/spark-tpcds

Apache Spark TPC-DS benchmark setup with EMR launch setup

Language: Smarty - Size: 1.3 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 10 - Forks: 4

Signiant/dynamodb-emr-exporter

Uses EMR clusters to export dynamoDB tables to S3 and generates import steps

Language: Shell - Size: 9.07 MB - Last synced at: 5 months ago - Pushed at: almost 3 years ago - Stars: 11 - Forks: 4

anthonywong611/Batch-ETL-with-AWS-EMR-and-MWAA

Create a data pipeline on AWS to execute batch processing in a Spark cluster provisioned by Amazon EMR. ETL using managed airflow: extracts data from S3, transform data using spark, load transformed data back to S3.

Language: Python - Size: 30.6 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 8 - Forks: 4

JennaFar/elastic-data-factory

Elastic Data Factory

Language: Python - Size: 185 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

jashshah-dev/Automating-EMR-Cluster-using-AWS-Lambda

Automate Amazon EMR clusters using Lambda for streamlined and scalable data processing workflows. Unlock the full potential of your data pipeline with LambdaEMR Automator.

Language: Python - Size: 8.79 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

jashshah-dev/AWS-Big-Data-Pipeline-orchestrated-with-Airflow

A robust data pipeline leveraging Amazon EMR and PySpark, orchestrated seamlessly with Apache Airflow for efficient batch processing

Language: Python - Size: 16.6 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Tanay0510/Data-Lake-with-Spark

Load data from S3, process the data into analytics tables using Spark and load them back into S3. Deployed this Spark process on a cluster using AWS EMR

Language: Python - Size: 418 KB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

amine-akrout/Udacity-DEND-Capstone-Project

Capstone Project for Udacity's Data Engineering Nanodegree : End-to-end data pipeline to analyze covid-19 effect on airbnb

Language: Jupyter Notebook - Size: 639 KB - Last synced at: 6 months ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

alikemalocalan/alibaba-cloud-emr-create-examples

Alibaba Cloud EMR Create Example for Python

Language: Python - Size: 4.88 KB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 1

sjmiller8182/Warehousing-Stock-Tweet-Data

A large-scale data framework that will enable us to store and analyze financial market data and drive future predictions for investment.

Language: TSQL - Size: 8.43 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 7 - Forks: 3

EddieAmaitum/NYC-Yellow-Taxi-DataOps-with-AWS-Analyzing-TLC-Datasets

Performed business operations using Big data technologies: AWS EMR, AWS RDS (MySQL), Hadoop, Apache Scoop, Apache HBase, MapReduce

Language: Python - Size: 5.63 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

ucaiado/etl-intraday-bidask

Hosting data lake with bid-ask data in S3 using Spark and Airflow

Language: Python - Size: 692 KB - Last synced at: 6 months ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 2

kulwinderkk/Big_data_Wrangling_GoogleNgram_data_analysis

Loaded, filtered and visualized Google Ngrams dataset, which was created by Google's research team by analyzing all of the content in Google Books from the 1800s into the 2000s, in a cloud-based distributed computing environment using Hadoop, Spark, and the AWS S3 file system.

Language: Jupyter Notebook - Size: 480 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

immu0001/Udacity-Data-Engineer-nanodegree

Classwork projects and home works done through Udacity data engineering nano degree

Language: Jupyter Notebook - Size: 101 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 64 - Forks: 71

RonnJacob/PageRank-MapReduce-Spark

Implemented the PageRank algorithm in Hadoop MapReduce framework and Spark.

Language: Java - Size: 442 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 1

anjijava16/Cloud_AWS_ARRS

Cloud-AccountReceivableReportSystem

Size: 732 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 1 - Forks: 0

ramtekeabhas7/Hive_Case_Study_using_AWS_Hadoop

The goal is to extract the data and gather insights from a real-life data set of an e-commerce company, using BIG Data tools like Hive, Hadoop, AWS etc.

Size: 6.29 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

shantamgarg24/Recipe_Recommender_Asssignment_EDA_Using_PySpark

Used Amazon AWS and PySpark to solve this EDA assignment

Language: Jupyter Notebook - Size: 268 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

tejaskenjale/Wine-quality-prediction-aws

Implementation of Random Forest algorithm using pyspark on AWS to classify the wines and deployment on Docker Container.

Language: Python - Size: 172 KB - Last synced at: 6 days ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

jpb111/AWS-EMR-APACHE-SPARK

Executing a python script on AWS EMR for big data analysis.

Language: Python - Size: 2.5 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

JohnnyLVP/Project-Standar-Documentation

This repository contains a definition of standar structure for Machine Learning and Data Pipelines Projects

Language: Python - Size: 57.6 KB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 0

sayaliwalke30/BigDataAnalysis-RecommenderForAmazon

Built a recommender system using Apache Mahout machine learning library carried out data analysis using Hadoop, Apache Hive & Pig on Amazon Customer Reviews Data set(130M+ reviews))

Size: 5.44 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

amrelauoty/Sparkify-Datalake-AWS

Data Engineering Expert Nanodegree - Data Lake on AWS using Spark and S3

Language: Jupyter Notebook - Size: 309 KB - Last synced at: 6 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

saurabhsoni5893/Udacity-Data-Engineering-Projects

Projects related to Udacity Data Engineering Nanodegree including Data Modeling, Infrastructure setup on AWS cloud, Data Warehousing and Data Lake development on Amazon EMR and Redshift, developing Data Pipelines using Apache Airflow.

Language: Jupyter Notebook - Size: 3.74 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 2

bdoepf/aws-emr-prometheus

Language: HCL - Size: 38.1 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 1

jaquelinecella/jaquelinecella-Bootcamp_modulo1_Eng_Dados_Cloud

Criação de Esteiras de Deploy com Git Actions para subir uma infraestrutura na AWS com o Terraform fazendo controle da versão. Tecnologias utilizadas: escrita no formato Delta, Lambda Function, Kinesis Streaming, S3, Athena, Glue e EMR.

Language: Jupyter Notebook - Size: 266 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

sanjaytom15/Hive-Case-Study

To extract data and gather insights from a real-life data set of an e-commerce company for analysing and gaining insights about customer behaviour.

Size: 2.77 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

HarshadRanganathan/aws-emr-launcher

Generic python library that enables to provision emr clusters with yaml config files (Configuration as Code)

Language: Python - Size: 128 KB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

tmusabbir/emr-with-custom-metrics

Amazon EMR Automatic Scaling using Custom Metrics

Language: Shell - Size: 1.73 MB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

nogueira-ric/emr-6.4-spark-3.1.2

AWS EMR 6.4 - Spark 3.1.2 - Python3.7.5

Language: Python - Size: 15.6 KB - Last synced at: 7 months ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

ajinChen/amazon-product-analysis

The goal of this repo is to analyze Amazon's digital product from different perspectives using AWS EMR.

Language: Jupyter Notebook - Size: 3.58 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 1

ZhipengHong0123/Amazon-Product-Analysis

Language: Jupyter Notebook - Size: 3.59 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

sepulworld/serverless-aws-emr-boilerplate

Event driven EMR via Serverless

Language: Python - Size: 25.4 KB - Last synced at: 5 months ago - Pushed at: almost 8 years ago - Stars: 2 - Forks: 2

rupeshtr78/aws-emr

Spark Job on Amazon EMR cluster

Size: 1.3 MB - Last synced at: 3 months ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

mikeacosta/florasense

Orchestrating Cloud ETL Workloads

Language: Python - Size: 7.31 MB - Last synced at: 7 months ago - Pushed at: almost 4 years ago - Stars: 2 - Forks: 1

mathias-mike/Crypto-vs-Economy

Data pipeline for analyzing the effects of economic indicators on cryptocurrencies

Language: Python - Size: 407 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 2

LFattorini/capstone-project-churn-prediction-udacity

In this project, we attempt to predict customer churn of a popular (not real) music service. We perform data analysis and machine learning model building on a large amount of data using Spark.

Language: Jupyter Notebook - Size: 127 MB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

manaswikamila05/Public-Clickstream-Data-Analysis

Used a public clickstream dataset of a cosmetics store to extract data and gather insights. Launched an EMR 5.29.0 cluster that utilizes Hive services and used optimized hive queries to improve their sales by identifying customer behavior.

Size: 3.06 MB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

sunnykan/sparkify-lake

Creates a data lake by moving data held in an AWS S3 bucket to another S3 bucket after transforming it into tables based on a star schema.

Language: Jupyter Notebook - Size: 416 KB - Last synced at: 8 months ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

BrightEmah123/emr-on-airflow-toolkit

A template for creating Amazon EMR clusters using either Amazon MWAA or a Dockerized Airflow Container as a workflow environment

Language: Python - Size: 1.69 MB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

Morgan-Sell/usa-tourism-etl

Coalesced and transformed various data sources to create a comprehensive data lake for the USA tourism sector.

Language: Jupyter Notebook - Size: 4.41 MB - Last synced at: 7 days ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

UdeshikaDissa/BigData-MapReduce

This BigData study intends to identify the most revenue-generating Taxi zones in New York City for the year 2019. Three MapReduce algorithms were developed and their performance was analyzed on different size of input datasets and different size clusters in EMR.

Language: Java - Size: 1.32 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 2 - Forks: 0

donjude/data-lakes-with-spark

This project is about building a data lake and creating an ETL pipeline in Spark that loads data from Amazon S3, processes the data into analytics tables, and loads them back into S3

Language: Python - Size: 412 KB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

rupeshtr78/blog

Big Data Spark Hadoop Kafka Flink Spark Streaming

Language: SCSS - Size: 10.1 MB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

jpsalado92/Udacity-DEND_DataLake-AWSEMR

Full code for UDACITY's Data Engineer Nano Degree project. Implementing a Data Lake in Amazon's cloud with AWS S3, AWS EMR and Spark.

Language: Python - Size: 5.2 MB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

a-Imantha/Mahout-Tutorial

Building a Recommender with Apache Mahout on Amazon Elastic MapReduce (EMR) Tutorial

Language: Python - Size: 104 MB - Last synced at: 3 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

skyler-myers-db/Common-Crawl-Analysis

Parsing the common crawl database using Scala and Spark

Language: Scala - Size: 1.06 MB - Last synced at: 3 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

omarfessi/UDACITY-CapstoneProject

It's just my first repo, feel free to give feedbacks 😁

Language: Jupyter Notebook - Size: 47.9 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

humbletrader/spark-best-practices

List of best practices and fixes for issues encountered while developing spark applications and their

Size: 4.1 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

alex-ber/docker-hive Fork of ops-guru/docker-hive

EMR 5.25.0 cluster single node Hadoop docker image. With Amazon Linux, Hadoop 2.8.5 and Hive 2.3.5

Language: Shell - Size: 45.9 KB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 1

AndoKalrisian/ETL-AWS-EMR-Spark-sample-project

Language: Jupyter Notebook - Size: 430 KB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 3

JevyanJ/emr-helper

The EMR Helper library tries to help when setting up and managing an EMR cluster.

Language: Python - Size: 22.5 KB - Last synced at: 22 days ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

sujeethshetty/aws-data-science

AWS Data Scientist Course Lab work

Size: 740 KB - Last synced at: 8 months ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

ucaiado/etl-spark-aws

Data Modeling with Spark for a data lake hosted on S3

Language: Python - Size: 23.4 KB - Last synced at: 6 months ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 1

rkr2017/emr-slack-notify

AWS Lambda function to send EMR events to Slack via SNS

Language: JavaScript - Size: 5.86 KB - Last synced at: 4 months ago - Pushed at: almost 8 years ago - Stars: 1 - Forks: 0

AmandaJunqueira/BigData

Sentiment Analysis using Common Crawl data

Language: Python - Size: 104 KB - Last synced at: about 1 month ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

nahidalam/Spark

Spark, Python, AWS EMR, MLLib, Spark Streaming, Spark - SQL

Language: Jupyter Notebook - Size: 2.93 KB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 1

carlossanchezvega/twitter Fork of Javier162380/twitter

This repository aims to capture and clean data from the twitter API in order to perform a sentiment analysis on an EMR cluster.

Language: Python - Size: 1.03 MB - Last synced at: 8 months ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0