An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: spark-cluster

mgarralda/hadoop-spark-cluster

Repository containing Docker images for create a cluster Spark on Hadoop Yarn.

Language: Dockerfile - Size: 286 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 7 - Forks: 3

Anas399/SPARK_CLUSTER_DOCKER

Set-up local spark cluster, hadoop (hdfs), airflow, postgresql on docker with ease, without any local installations

Size: 1000 Bytes - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

minsusun/deploy-spark-cluster

configs for deploying the spark clusters on docker and k8s !!

Language: Shell - Size: 78.1 KB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

AIxHunter/Spark-k8s-pod-template

Steps to deploy a Spark app to Kubernetes cluster using spark-submit or a pod template

Language: Shell - Size: 12.7 KB - Last synced at: 23 days ago - Pushed at: 3 months ago - Stars: 4 - Forks: 1

aimanamri/raspberry-pi4-hadoop-spark-cluster

This is a self-documentation of learning distributed data storage, parallel processing, and Linux OS using Apache Hadoop, Apache Spark and Raspbian OS. In this project, 3-node cluster will be setup using Raspberry Pi 4, install HDFS and run Spark processing jobs via YARN.

Language: Shell - Size: 5.21 MB - Last synced at: 15 days ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

euiyounghwang/spark_job_interface_service

spark_job_interface_service

Language: Python - Size: 140 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

longNguyen010203/Spark-Processing-AWS

👷🌇 Set up and build a big data processing pipeline with Apache Spark, 📦 AWS services (S3, EMR, EC2, IAM, VPC, Redshift) Terraform to setup the infrastructure and Integration Airflow to automate workflows🥊

Language: Python - Size: 1010 KB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 2 - Forks: 0

Turnipdo/Spark-Standalone-Cluster-Setup

To facilitate the initial setup of Apache Spark, this repository provides a beginner-friendly, step-by-step guide on setting up a master node and two worker nodes.

Language: Python - Size: 1.05 MB - Last synced at: 12 days ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

matthieuvion/spark-cluster

Steps to deploy a local spark cluster w/ Docker. Bonus: a ready-to-use notebook for model prediction on Pyspark using spark.ml Pipeline() on a well known dataset

Language: Jupyter Notebook - Size: 628 KB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

ayseirmak/DistributedFraudDetection

In this study, we propose to use a distributed storage and computation system in order to track money transfers instantly. In particular, we keep our transaction history in a distributed file system as a graph data structure. We try to detect illegal activities by using Graph Neural Networks (GNN) in distributed manner.

Language: Python - Size: 516 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

silencebingo/hadoop-spark-cluster

A Hadoop and Spark Cluster on Docker

Language: Shell - Size: 14.6 KB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0

DanMolenhouse/Distributed-Systems-Project5-Hadoop-and-Spark

In this project, we used both Hadoop / MapReduce and Spark to do distributed computing. The first task was to perform a series of operations using a Mapper and Reduce java file that was implemented on a Hadoop server. The second task was to perform similar operations, but on Spark instead.

Language: Java - Size: 70.3 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

SinghHarshita/Clustering-Algorithms-Spark

KMeans, Cure and Canpoy algorithms are demonstrated using Pyspark.

Language: Jupyter Notebook - Size: 150 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 5 - Forks: 0

shuaicj/spark-cluster-zk

A spark cluster containing multiple spark masters based on docker-compose.

Language: Shell - Size: 7.81 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 1

karamolegkos/Diastema

This is my contribution in the project Diastema

Language: Python - Size: 88.9 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

kumarvna/terraform-azurerm-hdinsight

Terraform module to create managed, full-spectrum, open-source analytics service Azure HDInsight. This module creates Apache Hadoop, Apache Spark, Apache HBase, Interactive Query (Apache Hive LLAP) and Apache Kafka clusters.

Language: HCL - Size: 365 KB - Last synced at: 9 days ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 5

pientaa/opening-black-box

Deep dive into Spark UDFs' characteristics.

Language: Jupyter Notebook - Size: 48.6 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

itsayushthada/SVD-on-Spark

Language: Jupyter Notebook - Size: 1.72 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

vaibhavmagon/Spark-Python-MovieReviews

Script to run and find similarities between movies from a movie lens data set using Python & Spark Clustering.

Language: Python - Size: 2.75 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 1

harshkavdikar1/GeoSpatial-DataAnalysis-With-Spark

A distributed application to identify top 50 taxi pickup locations in New York by analyzing over 1 billion records using apache spark, hadoop file system and scala.

Language: Scala - Size: 4.08 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

flaviostutz/spark-submit-scala

Spark submit extension from bde2020/spark-submit for Scala with SBT

Language: Scala - Size: 3.91 KB - Last synced at: 24 days ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

shuaicj/spark-cluster

A spark cluster based on docker-compose.

Language: Shell - Size: 6.84 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 3 - Forks: 4

ansjin/docker-spark

docker spark standalone

Language: Dockerfile - Size: 5.86 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 1

Related Keywords
spark-cluster 23 spark 14 docker-compose 6 docker 4 hadoop-cluster 4 apache-spark 4 pyspark 4 hadoop-hdfs 3 scala 3 pyspark-notebook 3 big-data 3 spark-submit 3 bigdata 2 kubernetes 2 spark-master 2 terraform 2 python 2 python3 2 mapreduce 2 spark-hadoop 2 hdfs 2 spark-yarn-docker 2 k8s 2 hdinsight-interactive-query-cluster 1 hdinsight-hbase-cluster 1 hdinsight-hadoop-cluster 1 hdinsight-cluster 1 hbase-cluster 1 hadoop-filesystem 1 machine-learning 1 azure-hdinsight 1 azure 1 mapreduce-python 1 zookeeper 1 apache-hive-cluster 1 spark-on-kubernetes 1 openstack-heat-api 1 openstack-heat 1 microstack 1 kubernetes-api 1 diastema 1 api 1 spark-standalone 1 jupyter-notebook 1 sbt 1 spark-sql 1 hot-zone-analysis 1 hot-cell-analysis 1 ec2-instance 1 recommendation-system 1 movielens-dataset 1 movie 1 easy-to-use 1 dataset 1 kubernetes-cluster 1 udf 1 cluster 1 black-box 1 terraform-module 1 spark-clusters 1 kafka-cluster 1 hdinsight-spark-cluster 1 hdinsight-kafka-cluster 1 iam 1 emr-cluster 1 data-pipeline 1 cloud-computing 1 aws-services 1 aws-s3 1 aws-ec2 1 aws 1 apache-airflow 1 spark-jobs 1 fastapi 1 yarn 1 spark-shell 1 raspberry-pi-4 1 hive 1 parallel-processing 1 sparkr 1 distributed-storage 1 pod 1 zeppelin-notebook 1 kmeans 1 cure 1 clustering-algorithm 1 clustering 1 canopy 1 big-data-analytics 1 mapreduce-java 1 spark-hadoop-docker 1 hadoop-mapreduce 1 hadoop 1 keras-tensorflow 1 graph-convolutional-networks 1 bigdl-orca 1 randomforestregressor 1 airflow 1 jupyter-docker-stacks 1 dataengineer 1