GitHub topics: aws-emr-clusters
OfficialYapper/Project-Credit-Risk-Analysis
German Credit Data - 1994
Language: Jupyter Notebook - Size: 2.42 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

terraform-aws-modules/terraform-aws-emr
Terraform module to create AWS EMR resources πΊπ¦
Language: HCL - Size: 98.6 KB - Last synced at: about 15 hours ago - Pushed at: 26 days ago - Stars: 26 - Forks: 24

JaewonSon37/Mining_Big_Data2
Topic: Exploring the Relationship Between Weather and Taxi Demand in Chicago
Language: Jupyter Notebook - Size: 181 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

RubensZimbres/Repo-2019
BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics
Language: Jupyter Notebook - Size: 57.8 MB - Last synced at: 3 months ago - Pushed at: almost 4 years ago - Stars: 138 - Forks: 73

matbragan/emr-airflow
Developing a Flow with EMR and Airflow
Language: Python - Size: 33.2 KB - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

felipeazucares/Airflow-EMR-Redshift
EMR + Hadoop to Redshift ELT workflow using spark steps API and orchestrated by Apache-Airflow, which ingests disparate datasets focused around 7Gb of I94 arrivals information to produce a simple star schema in Redshift
Language: Python - Size: 5.33 MB - Last synced at: 27 days ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 2

UCloudM/Steam_Analysis_For_Gamers
Analysis performed on data from the Steam platform using Apache Spark and Cloud services such as Amazon Web Services.
Language: Python - Size: 10.7 MB - Last synced at: 10 months ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 3

m1theus/aws-emr-terraform
Example for provisioning AWS EMR service with Terraform
Language: HCL - Size: 4.88 KB - Last synced at: about 1 year ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 1

polarbeargo/Udacity-nd027-Data-Lake
Language: Python - Size: 411 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 1

Chan2k20/Wine-Prediction-Prediction-Model-On-AWS-EMR
Implemented random forest machine learning algorithm using pyspark on AWS EMR to classify the wines. The model is then deployed in docker container.
Language: Python - Size: 120 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 1

AhmedDouaya/Deploiement_modele_cloud
Language: Jupyter Notebook - Size: 98.9 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

nihil21/DocxAnonymizer-spark Fork of Lostefra/DocxAnonymizer-spark
Stand-alone Scala & Java tool to anonymize OOXML Documents (DOCX)
Language: Java - Size: 3.42 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

SRVivek1/pyspark-rdd-dataframe-examples
PySpark RDD and DataFrame Examples
Language: Python - Size: 113 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

Adith-Rai/Reddit-Stock-Sentiment-Analyzer
A Cloud based Reddit stock sentiment analyzer that analyzes overall sentiment from a configurable selection of stock subreddits for each stock. The architecture utilizes AWS MSK (Kafka), AWS EMR (PySpark) and AWS Lambda (Python 3) for maximum scalability and the OpenAI API for sentiment analysis through prompt engineering.
Language: Python - Size: 1.46 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Prajna-Bahuguna/EventBridge-SNS-Terraform
Language: HCL - Size: 5.86 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

sagardua297/udacity-data-engineering-nd
Data Pipeline Analytics Platform is an end-to-end generic Big Data pipeline. Involves following tech stack: AWS S3, AWS Redshift, AWS EMR Cluster, Apache Spark, Apache Airflow.
Language: Python - Size: 1.81 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

im612/P8_big_data
A scalable prototype of an image recognition engine deployed on AWS.
Language: Jupyter Notebook - Size: 2.49 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

tugberkcapraz/capstone_sparkify
Predicting customer churn for the music app, Sparkify, using PySpark on AWS EMR clusters
Language: Jupyter Notebook - Size: 3.07 MB - Last synced at: 10 months ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

arjunsawhney1/scalable-ML Fork of rajeevdixit19/Scaleable-Ml
In this repo, I build a LogisticRegression prediction model with Dask and PySpark and initialize an AWS EMR cluster to run the entire pipeline.
Size: 131 KB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

SagarFall2022/BigData
Realtime data pipeline
Language: Jupyter Notebook - Size: 1.35 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

mochan42/Deploy-a-CNN-in-AWS-image-features-extraction-and-ACP
A CNN is deployed in AWS to extract image features in the context of distributed computing.
Language: Jupyter Notebook - Size: 3.34 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

khushal2405/Daily-Incremental-load-ETL-pipeline-for-Ecommerce-company-using-AWS-Lambda-and-Apache-airflow
Daily Incremental load ETL pipeline for Ecommerce company using AWS Lambda and AWS EMR cluster, Deployed using Apache airflow in a docker container.
Language: Python - Size: 19.5 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

dvu4/udacity-data-engineering
Data Engineering Projects including Data Modeling, Data Warehouse, Data Lake Development
Language: Jupyter Notebook - Size: 2.09 MB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 2 - Forks: 2

AWS-Big-Data-Projects/Run-a-Spark-job-within-Amazon-EMR
Run a Spark job within Amazon EMR
Language: Java - Size: 8.79 KB - Last synced at: 6 days ago - Pushed at: almost 5 years ago - Stars: 12 - Forks: 1

johnnyiller/cluster_funk
An opinionated framework for running big data jobs
Language: Python - Size: 83 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

silviomori/covid19-datalake
Language: Python - Size: 13.7 KB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 2 - Forks: 0

abhibalani/emr_lambda
Lambda to start EMR and run a map reduce job
Language: Python - Size: 2.93 KB - Last synced at: over 2 years ago - Pushed at: almost 6 years ago - Stars: 3 - Forks: 1

nikhilsu/Product-review-analysis-Spark-MongoDB
Performing various product review analysis on Amazon dataset using Apache Spark and MongoDB
Language: Java - Size: 56.6 KB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 1

kacperstyslo/most-wanted-programming-skills-finder
With this app, you can see what programming skills are most in-demand in the current job market.
Language: Python - Size: 97.7 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

anuragkr29/TightCommunityDetection
Detect Tight Communities in a social Network
Language: Scala - Size: 11.7 KB - Last synced at: almost 2 years ago - Pushed at: almost 6 years ago - Stars: 2 - Forks: 2

AleGuarnieri/Data-Lake-ETL
Udacity project: implementing an ETL to process data with Apache Spark and store them in AWS S3 storage
Language: Python - Size: 4.88 KB - Last synced at: 4 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

xianchen2/Analyzing_10GB_of_Yelp_Reviews_Data
AWS EMR backed Spark cluster for analyzing Yelp Data
Language: Jupyter Notebook - Size: 956 KB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 1

suvayu/emr-scripts
Shell scripts for AWS EMR clusters
Language: Shell - Size: 27.3 KB - Last synced at: 5 days ago - Pushed at: over 7 years ago - Stars: 7 - Forks: 2

geewynn/techcrunch_warehouse
Built a data model, data warehouse and pipeline for extracting transforming and loading data into a star schema-based data model in a redshift database
Language: Python - Size: 181 KB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

jjanczur/CloudComputing-assignment4
TU Berlin Cloud Computing - correctly implemented assignment4
Language: Java - Size: 5.73 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

deyb/airline-ontime-analytics
Analysis of Airline On Time Performance Dataset
Language: Java - Size: 7.63 MB - Last synced at: over 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 1
