An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: gcp-dataproc

SnehaDharne/BigDataAnalytics-MVCollisions

Leveraging NYC Open Data, this repository contains Databricks notebooks for analyzing motor vehicle collisions. We perform EDA, spatial clustering, and predictive modeling on collision, vehicle, and person datasets to understand accident trends and predict potential risks.

Language: Jupyter Notebook - Size: 7.64 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

BhushanSagar/Movie-Rating-Analysis

Movie Rating Analysis using Apache Spark (pyspark)

Language: Jupyter Notebook - Size: 1.03 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

BhushanSagar/Car-Insurance-Cold-Calls-Data-Analysis

Car Insurance Cold Calls Data Analysis using Apache Hive

Language: HiveQL - Size: 1.17 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

BhushanSagar/Marketing-Campaign-Data-Analysis

Marketing Campaign Data Analysis Using Apache Spark (PySpark)

Language: Jupyter Notebook - Size: 13.7 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

prakashdontaraju/google-cloud-ecommerce

ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipeline ― Cloud Storage, Dataproc, PySpark, Cloud Spanner and Tableau

Language: Python - Size: 4.38 MB - Last synced at: 9 months ago - Pushed at: about 3 years ago - Stars: 10 - Forks: 3

bug-data/Big_Data_First_Project

First project for Big Data course held at Roma Tre University

Language: Jupyter Notebook - Size: 2.16 MB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

dwaiba/dataproc-terraform

Dataproc Customisable HA cluster debian-9 with zookeeper,kafka ,BigQuery and other tools/jobs with Terraform

Language: HCL - Size: 28.3 KB - Last synced at: about 1 year ago - Pushed at: about 5 years ago - Stars: 3 - Forks: 7

ElhNour/large-scale-data-management-spark Fork of Huydatnguyen/LSDMLab

Process large amount of data and implement complex data analyses using Spark. The dataset has been made available by Google. It includes data about a cluster of 12500 machines, and the activity on this cluster during 29 days.

Language: Python - Size: 362 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

tansudasli/spark-sandbox

Apache spark sandbox on GCP and Amazon EMR.

Language: Jupyter Notebook - Size: 3.89 MB - Last synced at: about 2 months ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

askmrsinh/spark-stocksim

Monte Carlo stock simulation using Apache Spark.

Language: Scala - Size: 1.81 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 2 - Forks: 2

RickLeite/Hadoop-Google-DataProc-DIOstudy

Hadoop Google DataProc DIO study

Language: Python - Size: 221 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

DenisOgr/sentiment-batch-stream-pipeline

Language: Jupyter Notebook - Size: 1.23 MB - Last synced at: about 2 months ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

prodriguezdefino/dataproc-workflowtemplate-cloudfunction

Implements a work queue for Dataproc Worflow Template executions

Language: HCL - Size: 221 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

emanuelegiona/CC2019

Project for Cloud Computing course (A.Y. 2018/2019)

Language: Python - Size: 572 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

visalvo/projectScalable Fork of lateganto/projectScalable

Project for Scalable and Cloud Programming Course - 2018/19 UNIBO.

Size: 141 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0