An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: aws-emr-clusters

OfficialYapper/Project-Credit-Risk-Analysis

German Credit Data - 1994

Language: Jupyter Notebook - Size: 2.42 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

terraform-aws-modules/terraform-aws-emr

Terraform module to create AWS EMR resources πŸ‡ΊπŸ‡¦

Language: HCL - Size: 98.6 KB - Last synced at: about 15 hours ago - Pushed at: 26 days ago - Stars: 26 - Forks: 24

JaewonSon37/Mining_Big_Data2

Topic: Exploring the Relationship Between Weather and Taxi Demand in Chicago

Language: Jupyter Notebook - Size: 181 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

RubensZimbres/Repo-2019

BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics

Language: Jupyter Notebook - Size: 57.8 MB - Last synced at: 3 months ago - Pushed at: almost 4 years ago - Stars: 138 - Forks: 73

matbragan/emr-airflow

Developing a Flow with EMR and Airflow

Language: Python - Size: 33.2 KB - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

felipeazucares/Airflow-EMR-Redshift

EMR + Hadoop to Redshift ELT workflow using spark steps API and orchestrated by Apache-Airflow, which ingests disparate datasets focused around 7Gb of I94 arrivals information to produce a simple star schema in Redshift

Language: Python - Size: 5.33 MB - Last synced at: 27 days ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 2

UCloudM/Steam_Analysis_For_Gamers

Analysis performed on data from the Steam platform using Apache Spark and Cloud services such as Amazon Web Services.

Language: Python - Size: 10.7 MB - Last synced at: 10 months ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 3

m1theus/aws-emr-terraform

Example for provisioning AWS EMR service with Terraform

Language: HCL - Size: 4.88 KB - Last synced at: about 1 year ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 1

polarbeargo/Udacity-nd027-Data-Lake

Language: Python - Size: 411 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 1

Chan2k20/Wine-Prediction-Prediction-Model-On-AWS-EMR

Implemented random forest machine learning algorithm using pyspark on AWS EMR to classify the wines. The model is then deployed in docker container.

Language: Python - Size: 120 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 1

AhmedDouaya/Deploiement_modele_cloud

Language: Jupyter Notebook - Size: 98.9 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

nihil21/DocxAnonymizer-spark Fork of Lostefra/DocxAnonymizer-spark

Stand-alone Scala & Java tool to anonymize OOXML Documents (DOCX)

Language: Java - Size: 3.42 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

SRVivek1/pyspark-rdd-dataframe-examples

PySpark RDD and DataFrame Examples

Language: Python - Size: 113 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

Adith-Rai/Reddit-Stock-Sentiment-Analyzer

A Cloud based Reddit stock sentiment analyzer that analyzes overall sentiment from a configurable selection of stock subreddits for each stock. The architecture utilizes AWS MSK (Kafka), AWS EMR (PySpark) and AWS Lambda (Python 3) for maximum scalability and the OpenAI API for sentiment analysis through prompt engineering.

Language: Python - Size: 1.46 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Prajna-Bahuguna/EventBridge-SNS-Terraform

Language: HCL - Size: 5.86 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

sagardua297/udacity-data-engineering-nd

Data Pipeline Analytics Platform is an end-to-end generic Big Data pipeline. Involves following tech stack: AWS S3, AWS Redshift, AWS EMR Cluster, Apache Spark, Apache Airflow.

Language: Python - Size: 1.81 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

im612/P8_big_data

A scalable prototype of an image recognition engine deployed on AWS.

Language: Jupyter Notebook - Size: 2.49 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

tugberkcapraz/capstone_sparkify

Predicting customer churn for the music app, Sparkify, using PySpark on AWS EMR clusters

Language: Jupyter Notebook - Size: 3.07 MB - Last synced at: 10 months ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

arjunsawhney1/scalable-ML Fork of rajeevdixit19/Scaleable-Ml

In this repo, I build a LogisticRegression prediction model with Dask and PySpark and initialize an AWS EMR cluster to run the entire pipeline.

Size: 131 KB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

SagarFall2022/BigData

Realtime data pipeline

Language: Jupyter Notebook - Size: 1.35 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

mochan42/Deploy-a-CNN-in-AWS-image-features-extraction-and-ACP

A CNN is deployed in AWS to extract image features in the context of distributed computing.

Language: Jupyter Notebook - Size: 3.34 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

khushal2405/Daily-Incremental-load-ETL-pipeline-for-Ecommerce-company-using-AWS-Lambda-and-Apache-airflow

Daily Incremental load ETL pipeline for Ecommerce company using AWS Lambda and AWS EMR cluster, Deployed using Apache airflow in a docker container.

Language: Python - Size: 19.5 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

dvu4/udacity-data-engineering

Data Engineering Projects including Data Modeling, Data Warehouse, Data Lake Development

Language: Jupyter Notebook - Size: 2.09 MB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 2 - Forks: 2

AWS-Big-Data-Projects/Run-a-Spark-job-within-Amazon-EMR

Run a Spark job within Amazon EMR

Language: Java - Size: 8.79 KB - Last synced at: 6 days ago - Pushed at: almost 5 years ago - Stars: 12 - Forks: 1

johnnyiller/cluster_funk

An opinionated framework for running big data jobs

Language: Python - Size: 83 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

silviomori/covid19-datalake

Language: Python - Size: 13.7 KB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 2 - Forks: 0

abhibalani/emr_lambda

Lambda to start EMR and run a map reduce job

Language: Python - Size: 2.93 KB - Last synced at: over 2 years ago - Pushed at: almost 6 years ago - Stars: 3 - Forks: 1

nikhilsu/Product-review-analysis-Spark-MongoDB

Performing various product review analysis on Amazon dataset using Apache Spark and MongoDB

Language: Java - Size: 56.6 KB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 1

kacperstyslo/most-wanted-programming-skills-finder

With this app, you can see what programming skills are most in-demand in the current job market.

Language: Python - Size: 97.7 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

anuragkr29/TightCommunityDetection

Detect Tight Communities in a social Network

Language: Scala - Size: 11.7 KB - Last synced at: almost 2 years ago - Pushed at: almost 6 years ago - Stars: 2 - Forks: 2

AleGuarnieri/Data-Lake-ETL

Udacity project: implementing an ETL to process data with Apache Spark and store them in AWS S3 storage

Language: Python - Size: 4.88 KB - Last synced at: 4 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

xianchen2/Analyzing_10GB_of_Yelp_Reviews_Data

AWS EMR backed Spark cluster for analyzing Yelp Data

Language: Jupyter Notebook - Size: 956 KB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 1

suvayu/emr-scripts

Shell scripts for AWS EMR clusters

Language: Shell - Size: 27.3 KB - Last synced at: 5 days ago - Pushed at: over 7 years ago - Stars: 7 - Forks: 2

geewynn/techcrunch_warehouse

Built a data model, data warehouse and pipeline for extracting transforming and loading data into a star schema-based data model in a redshift database

Language: Python - Size: 181 KB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

jjanczur/CloudComputing-assignment4

TU Berlin Cloud Computing - correctly implemented assignment4

Language: Java - Size: 5.73 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

deyb/airline-ontime-analytics

Analysis of Airline On Time Performance Dataset

Language: Java - Size: 7.63 MB - Last synced at: over 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 1