GitHub topics: emr-cluster
san089/goodreads_etl_pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Language: Python - Size: 1.31 MB - Last synced at: 2 days ago - Pushed at: about 5 years ago - Stars: 1,365 - Forks: 224

cloudposse/terraform-aws-emr-cluster
Terraform module to provision an Elastic MapReduce (EMR) cluster on AWS
Language: HCL - Size: 4.06 MB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 74 - Forks: 82

Wittline/pyspark-on-aws-emr
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
Language: Python - Size: 3.61 MB - Last synced at: 8 days ago - Pushed at: almost 3 years ago - Stars: 27 - Forks: 13

RubensZimbres/Repo-2019
BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics
Language: Jupyter Notebook - Size: 57.8 MB - Last synced at: 14 days ago - Pushed at: over 3 years ago - Stars: 138 - Forks: 73

dacort/demo-code
Bits of code I use during live demos
Language: Jupyter Notebook - Size: 774 KB - Last synced at: 14 days ago - Pushed at: 4 months ago - Stars: 31 - Forks: 24

berksudan/Loan-Data-Report-with-AWS
Built a distributed system which completes several objectives with given data to generate loan reports using Amazon Web Services, Apache Spark, Java and Python.
Language: Java - Size: 3.67 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 1

airscholar/EMR-for-data-engineers
This project demonstrates the use of Amazon Elastic Map Reduce (EMR) for processing large datasets using Apache Spark. It includes a Spark script for ETL (Extract, Transform, Load) operations, AWS command line instructions for setting up and managing the EMR cluster, and a dataset for testing and demonstration purposes.
Language: Python - Size: 512 KB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 8

jfir/DataInsights
My Consulting Services
Language: HTML - Size: 2.1 MB - Last synced at: 24 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

desininja/Food-Delivery-RealTime-Data-Analysis
ETL Pipeline in AWS for Real Time Data Analysis
Language: Python - Size: 1.56 MB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

aws-samples/aws-dbs-refarch-datalake
Reference Architectures for Datalakes on AWS
Language: HTML - Size: 4.52 MB - Last synced at: 5 months ago - Pushed at: almost 5 years ago - Stars: 79 - Forks: 31

kevinndungu-source/Amazon_EMR_Project_Resources
Explore and replicate Amazon EMR (Elastic MapReduce) setup and utilization for big data processing and analytics tasks, featuring comprehensive demonstrations from VPC creation to Spark job execution.
Language: Jupyter Notebook - Size: 561 KB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

tawounfouet/data-scientist-ocr-x-centralsupelec
Experience with time-series analysis and forecasting models, large data sets, model development and visualisation, statistics.
Language: Jupyter Notebook - Size: 156 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

longNguyen010203/Spark-Processing-AWS
👷🌇 Set up and build a big data processing pipeline with Apache Spark, 📦 AWS services (S3, EMR, EC2, IAM, VPC, Redshift) Terraform to setup the infrastructure and Integration Airflow to automate workflows🥊
Language: Python - Size: 1010 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 2 - Forks: 0

camposvinicius/aws-etl
This is an ETL application on AWS with general open sales and customer data that you can find here: https://github.com/camposvinicius/data/blob/main/AdventureWorks.zip, it's a zipped file with some .csvs inside that we will apply transformations.
Language: Smarty - Size: 168 KB - Last synced at: 5 months ago - Pushed at: about 3 years ago - Stars: 17 - Forks: 3

rupeshtiwari/learning-apache-spark
apache spark
Language: Jupyter Notebook - Size: 41 KB - Last synced at: 6 days ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 1

matbragan/emr-airflow
Developing a Flow with EMR and Airflow
Language: Python - Size: 33.2 KB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Siddhesh19991/Automate_EMR_ETL_pipeline_using_Airflow
This project provides a detailed overview of creating an automated data engineering pipeline using Airflow, AWS services, Spark, Snowflake and Tableau
Language: Python - Size: 14.6 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

sowrabh-m/Data_Processing_using_Spark_Flink
This project demonstrates data cleaning, processing with Apache Spark and Apache Flink, both locally and on AWS EMR.
Language: Python - Size: 1.46 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

xianwill/spark-boilerplate
A boilerplate for spark projects with docker support for local development and scripts for emr support.
Language: Scala - Size: 30.3 KB - Last synced at: 9 days ago - Pushed at: over 7 years ago - Stars: 9 - Forks: 4

choang94/yelp-reviews
Loading Yelp Reviews Data from Kaggle to a Spark Cluster provisioned on AWS EMR and doing analyses
Language: Jupyter Notebook - Size: 1.85 MB - Last synced at: 11 months ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

cloudposse-archives/terraform-aws-spotinst-mrscaler
Terraform module to provision an Elastic MapReduce (EMR) cluster on AWS using a Spotinst AWS MrScaler resource
Size: 54.7 KB - Last synced at: 3 days ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

m1theus/aws-emr-terraform
Example for provisioning AWS EMR service with Terraform
Language: HCL - Size: 4.88 KB - Last synced at: 12 months ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 1

fermat01/ETL-Data-Pipeline-using-AWS-EMR-Spark-Glue-Athena
Etl data pipeline using aws services
Language: Python - Size: 4.07 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 3 - Forks: 0

bbstilson/emr-cluster-manager
Half-baked implementation of a cluster manager for EMR.
Language: Scala - Size: 22.5 KB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

yennanliu/spark_emr_dev
Collection of code for submitting Spark/Hadoop/Hive/Pig tasks to EMR (AWS Elastic MapReduce) | #DE
Language: Scala - Size: 3.72 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 1

BiGHeaDMaX/Traitement-Big-Data-avec-Spark
Ce projet a pour but de réaliser un traitement sur des données volumineuses à l'aide de Spark dans le cloud.
Language: Jupyter Notebook - Size: 2.64 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

adnanrahin/spark-rdd-df-comparison-emr
Language: Scala - Size: 22.5 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

nileshsingal/PUBG-DATA-ANALYSIS
Player Unknown's Battlegrounds (PUBG), is a first person shooter game where the goal is to be the last player standing. You are placed on a giant circular map that shrinks as the game goes on, and you must find weapons, armor, and other supplies in order to kill other players / teams and survive.
Language: Python - Size: 128 KB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 1

maelfabien/Cassandra-GDELT-Queries
A Cassandra Architecture for GDELT Database 🌍
Language: Shell - Size: 52.5 MB - Last synced at: 27 days ago - Pushed at: about 6 years ago - Stars: 11 - Forks: 4

dhiraa/spark-tpcds
Apache Spark TPC-DS benchmark setup with EMR launch setup
Language: Smarty - Size: 1.3 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 10 - Forks: 4

Signiant/dynamodb-emr-exporter
Uses EMR clusters to export dynamoDB tables to S3 and generates import steps
Language: Shell - Size: 9.07 MB - Last synced at: about 24 hours ago - Pushed at: over 2 years ago - Stars: 11 - Forks: 4

anthonywong611/Batch-ETL-with-AWS-EMR-and-MWAA
Create a data pipeline on AWS to execute batch processing in a Spark cluster provisioned by Amazon EMR. ETL using managed airflow: extracts data from S3, transform data using spark, load transformed data back to S3.
Language: Python - Size: 30.6 MB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 8 - Forks: 4

JennaFar/elastic-data-factory
Elastic Data Factory
Language: Python - Size: 185 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

jashshah-dev/Automating-EMR-Cluster-using-AWS-Lambda
Automate Amazon EMR clusters using Lambda for streamlined and scalable data processing workflows. Unlock the full potential of your data pipeline with LambdaEMR Automator.
Language: Python - Size: 8.79 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

jashshah-dev/AWS-Big-Data-Pipeline-orchestrated-with-Airflow
A robust data pipeline leveraging Amazon EMR and PySpark, orchestrated seamlessly with Apache Airflow for efficient batch processing
Language: Python - Size: 16.6 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Tanay0510/Data-Lake-with-Spark
Load data from S3, process the data into analytics tables using Spark and load them back into S3. Deployed this Spark process on a cluster using AWS EMR
Language: Python - Size: 418 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

amine-akrout/Udacity-DEND-Capstone-Project
Capstone Project for Udacity's Data Engineering Nanodegree : End-to-end data pipeline to analyze covid-19 effect on airbnb
Language: Jupyter Notebook - Size: 639 KB - Last synced at: 28 days ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

alikemalocalan/alibaba-cloud-emr-create-examples
Alibaba Cloud EMR Create Example for Python
Language: Python - Size: 4.88 KB - Last synced at: 12 months ago - Pushed at: almost 6 years ago - Stars: 1 - Forks: 1

sjmiller8182/Warehousing-Stock-Tweet-Data
A large-scale data framework that will enable us to store and analyze financial market data and drive future predictions for investment.
Language: TSQL - Size: 8.43 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 7 - Forks: 3

EddieAmaitum/NYC-Yellow-Taxi-DataOps-with-AWS-Analyzing-TLC-Datasets
Performed business operations using Big data technologies: AWS EMR, AWS RDS (MySQL), Hadoop, Apache Scoop, Apache HBase, MapReduce
Language: Python - Size: 5.63 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

ucaiado/etl-intraday-bidask
Hosting data lake with bid-ask data in S3 using Spark and Airflow
Language: Python - Size: 692 KB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 2

kulwinderkk/Big_data_Wrangling_GoogleNgram_data_analysis
Loaded, filtered and visualized Google Ngrams dataset, which was created by Google's research team by analyzing all of the content in Google Books from the 1800s into the 2000s, in a cloud-based distributed computing environment using Hadoop, Spark, and the AWS S3 file system.
Language: Jupyter Notebook - Size: 480 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

immu0001/Udacity-Data-Engineer-nanodegree
Classwork projects and home works done through Udacity data engineering nano degree
Language: Jupyter Notebook - Size: 101 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 64 - Forks: 71

RonnJacob/PageRank-MapReduce-Spark
Implemented the PageRank algorithm in Hadoop MapReduce framework and Spark.
Language: Java - Size: 442 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

anjijava16/Cloud_AWS_ARRS
Cloud-AccountReceivableReportSystem
Size: 732 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

ramtekeabhas7/Hive_Case_Study_using_AWS_Hadoop
The goal is to extract the data and gather insights from a real-life data set of an e-commerce company, using BIG Data tools like Hive, Hadoop, AWS etc.
Size: 6.29 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

shantamgarg24/Recipe_Recommender_Asssignment_EDA_Using_PySpark
Used Amazon AWS and PySpark to solve this EDA assignment
Language: Jupyter Notebook - Size: 268 KB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

tejaskenjale/Wine-quality-prediction-aws
Implementation of Random Forest algorithm using pyspark on AWS to classify the wines and deployment on Docker Container.
Language: Python - Size: 172 KB - Last synced at: 3 days ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

jpb111/AWS-EMR-APACHE-SPARK
Executing a python script on AWS EMR for big data analysis.
Language: Python - Size: 2.5 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

JohnnyLVP/Project-Standar-Documentation
This repository contains a definition of standar structure for Machine Learning and Data Pipelines Projects
Language: Python - Size: 57.6 KB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 0

sayaliwalke30/BigDataAnalysis-RecommenderForAmazon
Built a recommender system using Apache Mahout machine learning library carried out data analysis using Hadoop, Apache Hive & Pig on Amazon Customer Reviews Data set(130M+ reviews))
Size: 5.44 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

amrelauoty/Sparkify-Datalake-AWS
Data Engineering Expert Nanodegree - Data Lake on AWS using Spark and S3
Language: Jupyter Notebook - Size: 309 KB - Last synced at: 28 days ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

saurabhsoni5893/Udacity-Data-Engineering-Projects
Projects related to Udacity Data Engineering Nanodegree including Data Modeling, Infrastructure setup on AWS cloud, Data Warehousing and Data Lake development on Amazon EMR and Redshift, developing Data Pipelines using Apache Airflow.
Language: Jupyter Notebook - Size: 3.74 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 2

bdoepf/aws-emr-prometheus
Language: HCL - Size: 38.1 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 1

jaquelinecella/jaquelinecella-Bootcamp_modulo1_Eng_Dados_Cloud
Criação de Esteiras de Deploy com Git Actions para subir uma infraestrutura na AWS com o Terraform fazendo controle da versão. Tecnologias utilizadas: escrita no formato Delta, Lambda Function, Kinesis Streaming, S3, Athena, Glue e EMR.
Language: Jupyter Notebook - Size: 266 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

sanjaytom15/Hive-Case-Study
To extract data and gather insights from a real-life data set of an e-commerce company for analysing and gaining insights about customer behaviour.
Size: 2.77 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

HarshadRanganathan/aws-emr-launcher
Generic python library that enables to provision emr clusters with yaml config files (Configuration as Code)
Language: Python - Size: 128 KB - Last synced at: 1 day ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

tmusabbir/emr-with-custom-metrics
Amazon EMR Automatic Scaling using Custom Metrics
Language: Shell - Size: 1.73 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

nogueira-ric/emr-6.4-spark-3.1.2
AWS EMR 6.4 - Spark 3.1.2 - Python3.7.5
Language: Python - Size: 15.6 KB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

ajinChen/amazon-product-analysis
The goal of this repo is to analyze Amazon's digital product from different perspectives using AWS EMR.
Language: Jupyter Notebook - Size: 3.58 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 1

ZhipengHong0123/Amazon-Product-Analysis
Language: Jupyter Notebook - Size: 3.59 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

sepulworld/serverless-aws-emr-boilerplate
Event driven EMR via Serverless
Language: Python - Size: 25.4 KB - Last synced at: 25 days ago - Pushed at: over 7 years ago - Stars: 2 - Forks: 2

rupeshtr78/aws-emr
Spark Job on Amazon EMR cluster
Size: 1.3 MB - Last synced at: about 2 months ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

mikeacosta/florasense
Orchestrating Cloud ETL Workloads
Language: Python - Size: 7.31 MB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 1

mathias-mike/Crypto-vs-Economy
Data pipeline for analyzing the effects of economic indicators on cryptocurrencies
Language: Python - Size: 407 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 2

LFattorini/capstone-project-churn-prediction-udacity
In this project, we attempt to predict customer churn of a popular (not real) music service. We perform data analysis and machine learning model building on a large amount of data using Spark.
Language: Jupyter Notebook - Size: 127 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

manaswikamila05/Public-Clickstream-Data-Analysis
Used a public clickstream dataset of a cosmetics store to extract data and gather insights. Launched an EMR 5.29.0 cluster that utilizes Hive services and used optimized hive queries to improve their sales by identifying customer behavior.
Size: 3.06 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

sunnykan/sparkify-lake
Creates a data lake by moving data held in an AWS S3 bucket to another S3 bucket after transforming it into tables based on a star schema.
Language: Jupyter Notebook - Size: 416 KB - Last synced at: 4 months ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

BrightEmah123/emr-on-airflow-toolkit
A template for creating Amazon EMR clusters using either Amazon MWAA or a Dockerized Airflow Container as a workflow environment
Language: Python - Size: 1.69 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

Morgan-Sell/usa-tourism-etl
Coalesced and transformed various data sources to create a comprehensive data lake for the USA tourism sector.
Language: Jupyter Notebook - Size: 4.41 MB - Last synced at: 3 days ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

UdeshikaDissa/BigData-MapReduce
This BigData study intends to identify the most revenue-generating Taxi zones in New York City for the year 2019. Three MapReduce algorithms were developed and their performance was analyzed on different size of input datasets and different size clusters in EMR.
Language: Java - Size: 1.32 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

donjude/data-lakes-with-spark
This project is about building a data lake and creating an ETL pipeline in Spark that loads data from Amazon S3, processes the data into analytics tables, and loads them back into S3
Language: Python - Size: 412 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

rupeshtr78/blog
Big Data Spark Hadoop Kafka Flink Spark Streaming
Language: SCSS - Size: 10.1 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

jpsalado92/Udacity-DEND_DataLake-AWSEMR
Full code for UDACITY's Data Engineer Nano Degree project. Implementing a Data Lake in Amazon's cloud with AWS S3, AWS EMR and Spark.
Language: Python - Size: 5.2 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

samye760/Common-Crawl-Analysis
Parsing the common crawl database using Scala and Spark
Language: Scala - Size: 1.06 MB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

omarfessi/UDACITY-CapstoneProject
It's just my first repo, feel free to give feedbacks 😁
Language: Jupyter Notebook - Size: 47.9 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

humbletrader/spark-best-practices
List of best practices and fixes for issues encountered while developing spark applications and their
Size: 4.1 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

alex-ber/docker-hive Fork of ops-guru/docker-hive
EMR 5.25.0 cluster single node Hadoop docker image. With Amazon Linux, Hadoop 2.8.5 and Hive 2.3.5
Language: Shell - Size: 45.9 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 1

AndoKalrisian/ETL-AWS-EMR-Spark-sample-project
Language: Jupyter Notebook - Size: 430 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 3

JevyanJ/emr-helper
The EMR Helper library tries to help when setting up and managing an EMR cluster.
Language: Python - Size: 22.5 KB - Last synced at: 9 days ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

sujeethshetty/aws-data-science
AWS Data Scientist Course Lab work
Size: 740 KB - Last synced at: 4 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

ucaiado/etl-spark-aws
Data Modeling with Spark for a data lake hosted on S3
Language: Python - Size: 23.4 KB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 1

rkr2017/emr-slack-notify
AWS Lambda function to send EMR events to Slack via SNS
Language: JavaScript - Size: 5.86 KB - Last synced at: 5 months ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

nahidalam/Spark
Spark, Python, AWS EMR, MLLib, Spark Streaming, Spark - SQL
Language: Jupyter Notebook - Size: 2.93 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 1

carlossanchezvega/twitter Fork of Javier162380/twitter
This repository aims to capture and clean data from the twitter API in order to perform a sentiment analysis on an EMR cluster.
Language: Python - Size: 1.03 MB - Last synced at: 3 months ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

darkhipo/emr-example
running zeppelin on EMR and launching tasks on it with task runner.
Language: Python - Size: 8.79 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

mwilchek/Hadoop-Testing
Repo for playing around an AWS Elastic Map Reduce (EMR) cluster
Language: PigLatin - Size: 14.6 KB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 0

bhavaniprasad73/PigLatinScript
Language: PigLatin - Size: 434 KB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 1

danielhaviv/emr_storage_autoscaler
Language: Shell - Size: 5.86 KB - Last synced at: 9 days ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 0

siwest/usaspend
Annual Revenue Vs. Executive Pay for Recipients of U.S. Federal Funds; uses Scala Spark in Zeppelin notebook.
Size: 493 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0
