GitHub topics: gcp-dataproc
SnehaDharne/BigDataAnalytics-MVCollisions
Leveraging NYC Open Data, this repository contains Databricks notebooks for analyzing motor vehicle collisions. We perform EDA, spatial clustering, and predictive modeling on collision, vehicle, and person datasets to understand accident trends and predict potential risks.
Language: Jupyter Notebook - Size: 7.64 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

BhushanSagar/Movie-Rating-Analysis
Movie Rating Analysis using Apache Spark (pyspark)
Language: Jupyter Notebook - Size: 1.03 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

BhushanSagar/Car-Insurance-Cold-Calls-Data-Analysis
Car Insurance Cold Calls Data Analysis using Apache Hive
Language: HiveQL - Size: 1.17 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

BhushanSagar/Marketing-Campaign-Data-Analysis
Marketing Campaign Data Analysis Using Apache Spark (PySpark)
Language: Jupyter Notebook - Size: 13.7 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

prakashdontaraju/google-cloud-ecommerce
ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipeline ― Cloud Storage, Dataproc, PySpark, Cloud Spanner and Tableau
Language: Python - Size: 4.38 MB - Last synced at: 9 months ago - Pushed at: about 3 years ago - Stars: 10 - Forks: 3

bug-data/Big_Data_First_Project
First project for Big Data course held at Roma Tre University
Language: Jupyter Notebook - Size: 2.16 MB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

dwaiba/dataproc-terraform
Dataproc Customisable HA cluster debian-9 with zookeeper,kafka ,BigQuery and other tools/jobs with Terraform
Language: HCL - Size: 28.3 KB - Last synced at: about 1 year ago - Pushed at: about 5 years ago - Stars: 3 - Forks: 7

ElhNour/large-scale-data-management-spark Fork of Huydatnguyen/LSDMLab
Process large amount of data and implement complex data analyses using Spark. The dataset has been made available by Google. It includes data about a cluster of 12500 machines, and the activity on this cluster during 29 days.
Language: Python - Size: 362 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

tansudasli/spark-sandbox
Apache spark sandbox on GCP and Amazon EMR.
Language: Jupyter Notebook - Size: 3.89 MB - Last synced at: about 2 months ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

askmrsinh/spark-stocksim
Monte Carlo stock simulation using Apache Spark.
Language: Scala - Size: 1.81 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 2 - Forks: 2

RickLeite/Hadoop-Google-DataProc-DIOstudy
Hadoop Google DataProc DIO study
Language: Python - Size: 221 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

DenisOgr/sentiment-batch-stream-pipeline
Language: Jupyter Notebook - Size: 1.23 MB - Last synced at: about 2 months ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

prodriguezdefino/dataproc-workflowtemplate-cloudfunction
Implements a work queue for Dataproc Worflow Template executions
Language: HCL - Size: 221 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

emanuelegiona/CC2019
Project for Cloud Computing course (A.Y. 2018/2019)
Language: Python - Size: 572 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

visalvo/projectScalable Fork of lateganto/projectScalable
Project for Scalable and Cloud Programming Course - 2018/19 UNIBO.
Size: 141 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0
