Topic: "aws-emr"
adornes/spark_python_ml_examples
Spark 2.0 Python Machine Learning examples
Language: Python - Size: 13.7 KB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 93 - Forks: 42

adornes/spark_scala_ml_examples
Spark 2.0 Scala Machine Learning examples
Language: Scala - Size: 1.32 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 77 - Forks: 52

jwplayer/sparksteps
:star: CLI tool to launch Spark jobs on AWS EMR
Language: Python - Size: 216 KB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 67 - Forks: 12

dacort/demo-code
Bits of code I use during live demos
Language: Jupyter Notebook - Size: 774 KB - Last synced at: 20 days ago - Pushed at: 4 months ago - Stars: 31 - Forks: 24

Wittline/pyspark-on-aws-emr
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
Language: Python - Size: 3.61 MB - Last synced at: 14 days ago - Pushed at: almost 3 years ago - Stars: 27 - Forks: 13

abdullahkhawer/aws-auto-terminate-idle-emr
An AWS based solution using AWS CloudWatch and AWS Lambda based on Python to automatically terminate AWS EMR clusters that have been idle for a specified period of time.
Language: Python - Size: 22.5 KB - Last synced at: 25 days ago - Pushed at: 11 months ago - Stars: 26 - Forks: 16

terraform-aws-modules/terraform-aws-emr
Terraform module to create AWS EMR resources 🇺🇦
Language: HCL - Size: 94.7 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 24 - Forks: 23

amzn/rheoceros
Cloud-based AI / ML workflow and data application development framework
Language: Python - Size: 2.49 MB - Last synced at: 21 days ago - Pushed at: 8 months ago - Stars: 17 - Forks: 9

ismaildawoodjee/aws-data-pipeline
A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from locally hosted Airflow containers. The end product is a Superset dashboard and a Postgres database, hosted on an EC2 instance at this address (powered down):
Language: Python - Size: 4.77 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 17 - Forks: 6

memosstilvi/emr-cost-calculator
EMR Cost Calculator
Language: Python - Size: 13.7 KB - Last synced at: 3 months ago - Pushed at: about 7 years ago - Stars: 17 - Forks: 27

xonai-computing/xonai-dashboard
A Grafana-based application to assist Big Data infrastructure optimization initiatives where Spark applications are a dominant cost driver
Language: Python - Size: 6.28 MB - Last synced at: 17 days ago - Pushed at: 11 months ago - Stars: 14 - Forks: 1

AWS-Big-Data-Projects/Analysing-Census-Data-using-aws
Use aws-emr and aws-redshift to analyse dataset of adult census of USA
Size: 638 KB - Last synced at: 1 day ago - Pushed at: over 4 years ago - Stars: 13 - Forks: 0

AWS-Big-Data-Projects/AWS-EMR
Analyzing Big Data with Amazon EMR
Size: 9.77 KB - Last synced at: 1 day ago - Pushed at: over 4 years ago - Stars: 12 - Forks: 0

AWS-Big-Data-Projects/Run-a-Spark-job-within-Amazon-EMR
Run a Spark job within Amazon EMR
Language: Java - Size: 8.79 KB - Last synced at: 1 day ago - Pushed at: over 4 years ago - Stars: 12 - Forks: 1

mauropelucchi/aws-emr-docker-integration
AWS EMR Docker integration
Language: Dockerfile - Size: 13.7 KB - Last synced at: 11 days ago - Pushed at: over 4 years ago - Stars: 11 - Forks: 2

linghaol/CommunityDetection-Spark-AWS
A Spark application, written in Python, to figure out strongly connected components with Bi-directional Label Propagation algorithm. This project implemented an 1.3GB Twitter network dataset on AWS EMR cluster.
Language: Python - Size: 298 KB - Last synced at: about 1 year ago - Pushed at: almost 6 years ago - Stars: 9 - Forks: 4

daniel-cortez-stevenson/cookiecutter-pyspark-cloud
A cookiecutter template for working with PySpark on AWS EMR
Language: Python - Size: 305 KB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 8 - Forks: 2

sjmiller8182/Warehousing-Stock-Tweet-Data
A large-scale data framework that will enable us to store and analyze financial market data and drive future predictions for investment.
Language: TSQL - Size: 8.43 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 7 - Forks: 3

wingkwong/aws-playground
My AWS Playground
Language: Python - Size: 348 KB - Last synced at: 19 days ago - Pushed at: 10 months ago - Stars: 6 - Forks: 0

Mathews-Tom/MSc-in-Machine-Learning-and-Artificial-Intelligence
Master of Science in Machine Learning & Artificial Intelligence - Indian Institute Technology Madras & Liverpool John Moores University
Language: Jupyter Notebook - Size: 2.12 GB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 6 - Forks: 7

adornes/spark_r_ml_examples
Spark 2.0 R/SparkR Machine Learning examples
Language: R - Size: 6.84 KB - Last synced at: almost 2 years ago - Pushed at: over 8 years ago - Stars: 6 - Forks: 6

Nerdward/batch_gh_archive
Data Engineering Project with Terraform, Spark, AWS, Docker, Airflow and other tools
Language: Python - Size: 250 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 4 - Forks: 0

pratikbarjatya/spark-walmart-data-analysis-exercise
Data Analysis Exercise over Walmart Stock
Language: Jupyter Notebook - Size: 42 KB - Last synced at: about 1 year ago - Pushed at: almost 6 years ago - Stars: 4 - Forks: 13

HarshadRanganathan/aws-emr-launcher
Generic python library that enables to provision emr clusters with yaml config files (Configuration as Code)
Language: Python - Size: 128 KB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

abhibalani/emr_lambda
Lambda to start EMR and run a map reduce job
Language: Python - Size: 2.93 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 1

bajaj-varun/aws-test
Use-Case: Airline on-time performance
Language: Java - Size: 76.2 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 5

giulic3/data-engineering-nanodegree
Projects realized for the Data Engineering Nanodegree offered by Udacity https://www.udacity.com/course/data-engineer-nanodegree--nd027
Language: Jupyter Notebook - Size: 6.43 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 1

dhruv007patel/Impact-of-Covid-19-on-Aviation-Industry
This project analyzes the correlation between COVID-19 and the US aviation industry. By studying data on passenger/freight traffic and delays alongside COVID-19 trends, it provides insights into airline and passenger responses. The findings help airlines adapt to the pandemic's impact.
Language: Python - Size: 504 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0

jonathanAmancioSales/BigData_AWS_EMR_MRJob_DIO
Projeto de processamento distribuĂdo de dados utilizando Python, MRJob e AWS EMR
Language: Python - Size: 305 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0

AuFeld/AWS_MWAA_With_Step_Functions
Build modern workflows with AWS MWAA, AWS Step Functions, AWS Glue, and AWS EMR
Language: Python - Size: 437 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 2 - Forks: 1

ricardo-farias/CovidDataProduct
This repository will be used to understand data science and data engineering concepts
Language: Scala - Size: 641 KB - Last synced at: 11 months ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 1

silviomori/covid19-datalake
Language: Python - Size: 13.7 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 0

dvu4/udacity-data-engineering
Data Engineering Projects including Data Modeling, Data Warehouse, Data Lake Development
Language: Jupyter Notebook - Size: 2.09 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 2

NitinSPatil15/Project-4-Data-Lake-with-AWS-EMR
An ETL pipeline that extracts data from S3, processes them using Spark, and loads the data back into S3 as a set of dimensional tables
Language: Python - Size: 601 KB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 4

seahrh/bad-renter
Working examples of Spark ML Pipeline and SMOTE algorithm for synthetic data augmentation
Language: Scala - Size: 59.6 KB - Last synced at: 29 days ago - Pushed at: about 5 years ago - Stars: 2 - Forks: 0

mayankrastogi/faculty-page-rank
A Spark application to process the DBLP dataset to find out the Page Rank of faculty at the UIC CS department based on their co-authorships on publications.
Language: Scala - Size: 214 KB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 2 - Forks: 0

mayankrastogi/faculty-collaboration
A Hadoop Map-Reduce job to process the DBLP dataset to produce a graph depicting which professors at the CS department of UIC have co-authored publications.
Language: Scala - Size: 62.5 KB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 2 - Forks: 0

krishnan-mani/emr-access-bucket-cross-account
Illustrates access to S3 bucket owned by a different account from instances in an EMR cluster
Size: 4.88 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 2 - Forks: 1

trantuanngoc/us_immigration_data_engineering
US immigration data engineering : ETL pipeline, data modeling and warehousing of US immigration data
Language: HCL - Size: 3.58 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

samchenghowing/COMP4442
Analysis and monitoring system using AWS... Also the comp4442 project
Language: Python - Size: 38.5 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

markoshlima/crimes-map
This Big Data project consists of obtaining data on vehicle theft in the city of SĂŁo Paulo and consolidating it in a counting and heat map, in order to show areas with a higher index of this type of crime. All applicable in AWS Resources.
Language: Scala - Size: 13.7 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

khushal2405/ETL-pipeline-using-Airflow-and-AWS-EMR
We Build an ETL pipeline using Airflow that accomplishes the following: Downloads data from an AWS S3 bucket, Runs a Spark/Spark SQL job on the downloaded data producing a cleaned-up dataset of delivery deadline missing orders and then Upload the cleaned-up dataset back to the same S3 bucket in a folder primed for higher level analytics
Language: Python - Size: 15.4 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

MDS-BD/aws-emr-local-dev-env-with-docker
Companion repository related to an AWS tech blog article.
Language: Dockerfile - Size: 9.77 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 1

johnnyiller/cluster_funk
An opinionated framework for running big data jobs
Language: Python - Size: 83 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

PannagaS/ETL-Logic-orchestration-using-Spark-and-AWS
An ETL logic is written in Spark for transforming the given data set present in S3, and query on the transformed data is run using AWS Redshift. The data sets are in json format. All the raw data in json format has to be first uploaded to an S3 source bucket. Using EMR, a Spark job is executed, which would fetch the source data from S3 source bucket, and then perform the necessary transformations on it as per the problem statement. Finally, store the transformed data were to partitioned and stored in parquet format in S3 destination bucket. Now, these files are accessed using AWS Redshift by running SQL queries on the transformed processed data.
Language: Python - Size: 1.6 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

GiladAbudi/CollocationExtraction
Map-Reduce program which produces a list of top-100 collocations from the Google 2-grams, Build with Java, AWS - Hadoop - Amazon Elastic Map Reduce
Language: Java - Size: 38.8 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

cevoaustralia/data-lake-demo
Data lake demo using change data capture (CDC) on AWS
Language: PLpgSQL - Size: 215 KB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 3

carlostomeh/Predict_Marketing_Campaign_Success
Goal: Develop Machine Learning aplication in a distributed environment using AWS services with Spark.
Language: Jupyter Notebook - Size: 469 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

kunjmehta/ny-taxi-prediction-pyspark-emr
Repo containing the notebook for my PySpark big data EDA and ML project of New York taxi fare prices built using AWS EMR clusters
Language: Jupyter Notebook - Size: 502 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

najuzilu/DL-Spark
Building a Data Lake with Spark
Language: Python - Size: 894 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

motua16/Sparkify-Churn-Prediction---Pyspark
Machine Learning on a Large 12 GB dataset with Pyspark on AWS EMR
Language: Jupyter Notebook - Size: 2.35 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

SmellyArmure/OC_DS_Project8
Concevoir et déployer une architecture Big Data sur AWS (OpenClassrooms | Data Scientist | Projet 8)
Language: Jupyter Notebook - Size: 21.7 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

jomavera/dataPipelineEMR
ETL pipeline with PySpark on EMR orchestrated with Airflow
Language: Python - Size: 87.9 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

Mark-McAdam/Data-Engineering-Batch
Takes product reviews and performs natural language processing to provide sentiment analysis. The new insight gets combined with matching product information in the central database to provide a clearer picture of user behavior.
Language: Python - Size: 963 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

andre-marcos-perez/ifood-arch-readme
The application is the documentation of my solution for the iFood data architect test.
Size: 454 KB - Last synced at: 23 days ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

rupeshtr78/awsiot
AWS IOT Intergration Using EMR Spark Kinesis
Language: Jupyter Notebook - Size: 117 KB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

harshkavdikar1/Udacity-DataEngineering-NanoDegree
Language: Jupyter Notebook - Size: 3.58 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

m1theus/aws-emr-terraform
Example for provisioning AWS EMR service with Terraform
Language: HCL - Size: 4.88 KB - Last synced at: 12 months ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 1

JiajunSong629/AWS_EMR_Spark_Workflow
Spark jobs workflow on AWS EMR
Language: Python - Size: 5.86 KB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

tansudasli/spark-sandbox
Apache spark sandbox on GCP and Amazon EMR.
Language: Jupyter Notebook - Size: 3.89 MB - Last synced at: about 2 months ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

pradeepbhadani/tf-examples
Terraform Examples
Language: HCL - Size: 46.9 KB - Last synced at: about 2 months ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 7

ahuber1/Hubble-Simulator-AWS-Version
A version of the "Hubble Simulator" project that uses as many AWS services as possible. (Original project at https://github.com/ahuber1/Project5)
Language: Java - Size: 6.39 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

Huizerd/ET4310_SBD
Assignments belonging to the course Supercomputing for Big Data (ET4310) at TU Delft
Language: Scala - Size: 40.8 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

HAOYU-LI/SparkML-Churn-Prediction
Language: Jupyter Notebook - Size: 353 KB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 0

AditModi/realtime-bushfire-alert-with-apache-flink-cep Fork of aws-samples/realtime-bushfire-alert-with-apache-flink-cep
Code and documentation for the demonstration example of the real-time bushfire alerting with the Complex Event Processing (CEP) in Apache Flink on Amazon EMR and a simulated IoT sensor network as described on the AWS Big Data Blog: Real-time bushfire alerting with Complex Event Processing in Apache Flink on Amazon EMR and IoT sensor network.
Size: 19.2 MB - Last synced at: 12 months ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

matchilling/kata-mapreduce
Language: Jupyter Notebook - Size: 9.69 MB - Last synced at: 25 days ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

smart-storm/storm-emr
spin up EMR (AWS) cluster for the merge conversion purposes. Uses AWS cloudformation.
Size: 0 Bytes - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

shanmuga-sudan/Big-Data-Systems
This repo contains all the assignments, project work on Engineering Big Data Systems coursework
Language: C# - Size: 299 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

jamespaultg/AWS_EMR
Language: Python - Size: 11.7 KB - Last synced at: over 1 year ago - Pushed at: about 8 years ago - Stars: 1 - Forks: 2

branesh2k/AWS-emr-project
AWS EMR-based ETL pipeline using PySpark and S3. Executed using SSH spark-submit.
Language: Python - Size: 1.29 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

kgelli/NYC-Taxi-Analytics---Spark-ETL-Pipeline-on-AWS-EMR
NYC Taxi Analytics: Spark ETL Pipeline on AWS EMR for processing and analyzing NYC taxi trip data using Apache Spark and Amazon Elastic MapReduce.
Language: Python - Size: 637 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

w-k-s/Celebrity-Soundbites-MapReduce-Experiment
Using MapReduce to build a dictionary of YouTube celebrity video clips
Language: Python - Size: 60.5 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

dmrhimali/terraform
Tutorial on how to create and run terraform scripts for providers aws and newrelic
Language: HCL - Size: 20.6 MB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

sowrabh-m/Data_Processing_using_Spark_Flink
This project demonstrates data cleaning, processing with Apache Spark and Apache Flink, both locally and on AWS EMR.
Language: Python - Size: 1.46 MB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

datawaver/emre-airflow
Use Airflow to create and run Spark Jobs with an EMRE Spark cluster
Language: Python - Size: 17.6 KB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

gakas14/Batch-Data-Pipeline-using-Airflow-Spark-EMR-Snowflake
The project will utilize Airflow to orchestrate and manage the data pipeline as it creates and terminates an EMR transient cluster to save on cost. Apache Spark will transform data, and the final dataset will be loaded into Snowflake.
Language: Python - Size: 13.7 KB - Last synced at: about 2 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

epap011/Spark-EMR-HiBench-Performance-Testing
Analyzing Spark Cluster Performance in Amazon EMR
Language: Python - Size: 1.06 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

raghadkibrahim/google-ngrams-big-data
Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

BiGHeaDMaX/Traitement-Big-Data-avec-Spark
Ce projet a pour but de réaliser un traitement sur des données volumineuses à l'aide de Spark dans le cloud.
Language: Jupyter Notebook - Size: 2.64 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

TimKong21/AWS-Batch-Processing
Big data analysis with AWS services, filtering the Wikiticker dataset with Apache Spark on Amazon EMR, storing data in S3, cataloging with AWS Glue, and querying with Amazon Athena. This end-to-end pipeline exemplifies handling and analyzing big data in the cloud.
Language: Python - Size: 8.01 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

seyfal/SparkMitMAttackSim
Scalable simulation of MitM attacks using parallel random walks and graph analytics on Spark.
Language: Scala - Size: 76.2 KB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

eljandoubi/Sparkify
Utilize Apache Spark for ETL processes to prepare data, followed by the construction of a Machine Learning model for Natural Language Processing (NLP) classification. Subsequently, deploy the model within a Gradio web application for seamless interaction.
Language: Jupyter Notebook - Size: 805 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

sibyabin/blogs
Technology blogging website from Siby Abin. Talks about dataengineering, aws, spark, python, airflow and more
Language: SCSS - Size: 6.33 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

seyfal/MapReduceGraphComparison
Distributed computational problem-solving project, which aims to perform large-scale graph matching using cloud computing technologies. The project allows users to import two directed graphs and analyze the differences between them.
Language: Scala - Size: 1.76 MB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

shinde-chandrakant/BigData-Ops-on-TLC-Yellow-Taxi
Analysed New York City's Yellow taxi data set with Big Data tools such as Hadoop, HBase, Sqoop, MapReduce and AWS Cloud Infrastructure.
Language: Python - Size: 7.19 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

datamindedacademy/getting_started_iac
A repository to practice Terraform and GitHub Actions with AWS
Language: HCL - Size: 18.6 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Yassaadi/Scaling_cnn_emr
Language: Jupyter Notebook - Size: 5.99 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

snowplow/emr-etl-runner
Run Snowplow's enrichments on Amazon Elastic MapReduce with minimum fuss
Language: Ruby - Size: 774 KB - Last synced at: 6 days ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 5

jacky1c/A-Case-Study-Applied-Regression-Analysis-Using-Distributed-Machine-Learning
Language: Jupyter Notebook - Size: 1.59 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

khushal2405/Daily-Incremental-load-ETL-pipeline-for-Ecommerce-company-using-AWS-Lambda-and-Apache-airflow
Daily Incremental load ETL pipeline for Ecommerce company using AWS Lambda and AWS EMR cluster, Deployed using Apache airflow in a docker container.
Language: Python - Size: 19.5 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

escobarana/sbt_spark_batch
Batch Spark job of CO2_emission data using SBT tool
Language: Scala - Size: 20.3 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Moenupa/COMP4442_Proj
Real-time Web App for Driving Statistics, Cloud Computing final project
Language: HTML - Size: 10.1 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

javi-domi/aws-datalake
Datalake on AW
Language: Python - Size: 9.77 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

pgplarosa/COVID-Detection-via-CT-Scan-Image-Analysis
Big Data and Cloud Computing Mini Project 2 - March 07, 2022
Language: HTML - Size: 35.5 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Dindera/Sparkify_data_lake
DataLake on AWS
Language: Jupyter Notebook - Size: 400 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

vvr-rao/Star-Chart
Pet project to create a Starhopping website for Astronomy. Exploring Concepts from Graph Databases, Apache Spark and Static Website hosting.
Language: Python - Size: 24.7 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

kb789/Data-Lake-CarAccidents
In this project, PySpark on AWS EMR was used to clean, model, and pipeline data from two large datasets.
Language: Jupyter Notebook - Size: 64.5 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

senthuran16/word-count-streaming-python-hadoop-mapreduce
A word count streaming MapReduce implementation with Python
Language: Python - Size: 586 KB - Last synced at: 30 days ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

helioRocha/dio-ccde-p03
Criando seu Ecossistema de Big Data na Nuvem
Language: Python - Size: 560 KB - Last synced at: 12 months ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

lpillmann/udacity-data-lake-s3-spark
Code for Udacity Data Engineering Nanodegree Project named Data Lake with AWS S3 and Spark
Language: Python - Size: 417 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0
