An open API service providing repository metadata for many open source software ecosystems.

Topic: "apache-spark"

mlflow/mlflow

Open source platform for the machine learning lifecycle

Language: Python - Size: 796 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 20,908 - Forks: 4,609

microsoft/SynapseML

Simple and Distributed Machine Learning

Language: Scala - Size: 157 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 5,139 - Forks: 845

treeverse/lakeFS

lakeFS - Data version control for your data lake | Git for data

Language: Go - Size: 150 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 4,723 - Forks: 380

lw-lin/CoolplaySpark

酷玩 Spark: Spark 源代码解析、Spark 类库等

Language: Scala - Size: 9.54 MB - Last synced at: 25 days ago - Pushed at: about 3 years ago - Stars: 3,488 - Forks: 1,410

spark-notebook/spark-notebook

Interactive and Reactive Data Science using Scala and Spark.

Language: JavaScript - Size: 15.8 MB - Last synced at: 25 days ago - Pushed at: about 2 years ago - Stars: 3,152 - Forks: 653

kubeflow/spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.

Language: Go - Size: 25.5 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2,952 - Forks: 1,411

intel/BigDL

BigDL: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray

Language: Jupyter Notebook - Size: 356 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 2,674 - Forks: 731

dotnet/spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Language: C# - Size: 4.87 MB - Last synced at: 21 days ago - Pushed at: about 1 month ago - Stars: 2,058 - Forks: 327

big-data-europe/docker-spark

Apache Spark docker image

Language: Shell - Size: 7.78 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 2,055 - Forks: 702

feathr-ai/feathr

Feathr – A scalable, unified data and AI engineering platform for enterprise

Language: Scala - Size: 29.4 MB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 1,898 - Forks: 232

awesome-spark/awesome-spark

A curated list of awesome Apache Spark packages and resources.

Language: Shell - Size: 231 KB - Last synced at: 2 days ago - Pushed at: 8 months ago - Stars: 1,802 - Forks: 340

OryxProject/oryx 📦

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Language: Java - Size: 7.12 MB - Last synced at: 4 months ago - Pushed at: almost 4 years ago - Stars: 1,781 - Forks: 404

japila-books/apache-spark-internals

The Internals of Apache Spark

Size: 145 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 1,500 - Forks: 459

ptyadana/SQL-Data-Analysis-and-Visualization-Projects

SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.

Language: Jupyter Notebook - Size: 35.1 MB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 1,436 - Forks: 553

san089/goodreads_etl_pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Language: Python - Size: 1.31 MB - Last synced at: about 1 month ago - Pushed at: over 5 years ago - Stars: 1,378 - Forks: 227

databricks/LearningSparkV2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

Language: Scala - Size: 75.2 MB - Last synced at: about 19 hours ago - Pushed at: 5 months ago - Stars: 1,306 - Forks: 775

lensacom/sparkit-learn

PySpark + Scikit-learn = Sparkit-learn

Language: Python - Size: 444 KB - Last synced at: 25 days ago - Pushed at: over 4 years ago - Stars: 1,154 - Forks: 256

databricks/spark-sklearn 📦

(Deprecated) Scikit-learn integration package for Apache Spark

Language: Python - Size: 782 KB - Last synced at: about 19 hours ago - Pushed at: over 5 years ago - Stars: 1,079 - Forks: 228

mahmoudparsian/data-algorithms-book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

Language: Java - Size: 397 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 1,075 - Forks: 661

graphframes/graphframes

GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs

Language: Scala - Size: 3.85 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1,059 - Forks: 250

sparklyr/sparklyr

R interface for Apache Spark

Language: R - Size: 97 MB - Last synced at: 8 days ago - Pushed at: 3 months ago - Stars: 962 - Forks: 308

microsoft/Mobius

C# and F# language binding and extensions to Apache Spark

Language: C# - Size: 6.44 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 940 - Forks: 211

LucaCanali/sparkMeasure

This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.

Language: Scala - Size: 1.96 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 758 - Forks: 151

lw-lin/streaming-readings

Streaming System 相关的论文读物

Size: 6.84 KB - Last synced at: 30 days ago - Pushed at: over 3 years ago - Stars: 733 - Forks: 154

aloneguid/parquet-dotnet

Fully managed Apache Parquet implementation

Language: C# - Size: 121 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 732 - Forks: 160

miguno/kafka-storm-starter 📦

[PROJECT IS NO LONGER MAINTAINED] Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.

Language: Scala - Size: 393 KB - Last synced at: 3 days ago - Pushed at: over 3 years ago - Stars: 724 - Forks: 329

mrpowers-io/quinn

pyspark methods to enhance developer productivity 📣 👯 🎉

Language: Python - Size: 1.98 MB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 672 - Forks: 98

nchammas/flintrock

A command-line tool for launching Apache Spark clusters.

Language: Python - Size: 785 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 642 - Forks: 117

cerndb/dist-keras 📦

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

Language: Python - Size: 54.6 MB - Last synced at: 2 days ago - Pushed at: almost 7 years ago - Stars: 623 - Forks: 167

apache-spark-on-k8s/spark Fork of apache/spark 📦

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/

Language: Scala - Size: 260 MB - Last synced at: 5 months ago - Pushed at: over 5 years ago - Stars: 612 - Forks: 118

openscoring/openscoring

REST web service for the true real-time scoring (<1 ms) of Scikit-Learn, R and Apache Spark models

Language: Java - Size: 869 KB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 583 - Forks: 171

infoslack/awesome-kafka

A list about Apache Kafka

Size: 96.7 KB - Last synced at: 2 days ago - Pushed at: 3 months ago - Stars: 579 - Forks: 164

japila-books/spark-sql-internals

The Internals of Spark SQL

Size: 1.46 GB - Last synced at: 24 days ago - Pushed at: 5 months ago - Stars: 466 - Forks: 132

rjurney/Agile_Data_Code_2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

Language: Jupyter Notebook - Size: 23.2 MB - Last synced at: 2 days ago - Pushed at: about 1 year ago - Stars: 460 - Forks: 310

cartershanklin/pyspark-cheatsheet

PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster

Language: Python - Size: 16.8 MB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 451 - Forks: 220

LucaCanali/Miscellaneous

Includes notes on using Apache Spark, with drill down on Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark. Also tools for stress testing and measuring CPUs's performance. Jupyter notebooks examples for using various DB systems.

Language: Jupyter Notebook - Size: 34.3 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 449 - Forks: 152

tweag/sparkle

Haskell on Apache Spark.

Language: Haskell - Size: 1.1 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 448 - Forks: 27

japila-books/spark-structured-streaming-internals

The Internals of Spark Structured Streaming

Size: 119 MB - Last synced at: 29 days ago - Pushed at: over 2 years ago - Stars: 420 - Forks: 172

1duo/awesome-ai-infrastructures

Infrastructures™ for Machine Learning Training/Inference in Production.

Size: 11.8 MB - Last synced at: about 1 month ago - Pushed at: about 6 years ago - Stars: 416 - Forks: 74

ekampf/PySpark-Boilerplate

A boilerplate for writing PySpark Jobs

Language: Python - Size: 10.7 KB - Last synced at: 29 days ago - Pushed at: over 1 year ago - Stars: 396 - Forks: 155

awesome-spark/spark-gotchas 📦

Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks

Size: 188 KB - Last synced at: about 1 month ago - Pushed at: about 8 years ago - Stars: 363 - Forks: 80

tirthajyoti/Spark-with-Python

Fundamentals of Spark with Python (using PySpark), code examples

Language: Jupyter Notebook - Size: 8.97 MB - Last synced at: 29 days ago - Pushed at: over 2 years ago - Stars: 347 - Forks: 271

datamechanics/delight 📦

A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.

Language: Scala - Size: 2.31 MB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 345 - Forks: 53

opencypher/morpheus

Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.

Language: Scala - Size: 29.6 MB - Last synced at: 11 days ago - Pushed at: almost 5 years ago - Stars: 341 - Forks: 62

dmmiller612/sparktorch

Train and run Pytorch models on Apache Spark.

Language: Python - Size: 8.83 MB - Last synced at: 29 days ago - Pushed at: about 2 years ago - Stars: 339 - Forks: 46

miguno/wirbelsturm 📦

[PROJECT IS NO LONGER MAINTAINED] Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.

Language: Shell - Size: 309 KB - Last synced at: 3 days ago - Pushed at: over 3 years ago - Stars: 328 - Forks: 72

Hydrospheredata/mist

Serverless proxy for Spark cluster

Language: Scala - Size: 9.96 MB - Last synced at: 29 days ago - Pushed at: over 4 years ago - Stars: 326 - Forks: 72

josephmachado/efficient_data_processing_spark

Code for "Efficient Data Processing in Spark" Course

Language: Python - Size: 23.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 304 - Forks: 63

microsoft/data-accelerator

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Language: C# - Size: 401 MB - Last synced at: 4 days ago - Pushed at: 3 months ago - Stars: 302 - Forks: 90

MingChen0919/learning-apache-spark

Notes on Apache Spark (pyspark)

Language: HTML - Size: 20.1 MB - Last synced at: 28 days ago - Pushed at: over 6 years ago - Stars: 299 - Forks: 186

lifeomic/sparkflow

Easy to use library to bring Tensorflow on Apache Spark

Language: Python - Size: 8.79 MB - Last synced at: 29 days ago - Pushed at: over 1 year ago - Stars: 296 - Forks: 45

cuebook/cuelake

Use SQL to build ELT pipelines on a data lakehouse.

Language: JavaScript - Size: 28 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 288 - Forks: 28

svenkreiss/pysparkling

A pure Python implementation of Apache Spark's RDD and DStream interfaces.

Language: Python - Size: 3.45 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 269 - Forks: 44

hortonworks-spark/spark-atlas-connector

A Spark Atlas connector to track data lineage in Apache Atlas

Language: Scala - Size: 903 KB - Last synced at: 28 days ago - Pushed at: over 2 years ago - Stars: 266 - Forks: 150

jaceklaskowski/spark-workshop

Apache Spark™ and Scala Workshops

Language: HTML - Size: 57 MB - Last synced at: 29 days ago - Pushed at: 11 months ago - Stars: 264 - Forks: 148

PiercingDan/spark-Jupyter-AWS

A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support

Language: Jupyter Notebook - Size: 220 KB - Last synced at: about 1 month ago - Pushed at: over 7 years ago - Stars: 261 - Forks: 18

dataflint/spark

A modern replacement Apache Spark UI

Language: TypeScript - Size: 17.8 MB - Last synced at: 19 days ago - Pushed at: 20 days ago - Stars: 254 - Forks: 30

airscholar/e2e-data-engineering

An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.

Language: Python - Size: 289 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 250 - Forks: 123

Mellanox/SparkRDMA 📦

This is archive of SparkRDMA project. The new repository with RDMA shuffle acceleration for Apache Spark is here: https://github.com/Nvidia/sparkucx

Language: Java - Size: 259 KB - Last synced at: 5 months ago - Pushed at: about 6 years ago - Stars: 242 - Forks: 71

Azure/azure-event-hubs-spark

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs

Language: Scala - Size: 19.6 MB - Last synced at: 7 days ago - Pushed at: 4 months ago - Stars: 236 - Forks: 178

Chabane/bigdata-playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Language: TypeScript - Size: 3.08 MB - Last synced at: about 2 months ago - Pushed at: over 6 years ago - Stars: 209 - Forks: 74

Azure/azure-cosmosdb-spark 📦

Apache Spark Connector for Azure Cosmos DB

Size: 192 MB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 203 - Forks: 120

lynnlangit/learning-hadoop-and-spark

Companion to Learning Hadoop and Learning Spark courses on Linked In Learning

Language: HTML - Size: 13.6 MB - Last synced at: 4 days ago - Pushed at: 6 months ago - Stars: 195 - Forks: 165

databrickslabs/automl-toolkit 📦

Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.

Language: HTML - Size: 158 MB - Last synced at: 3 months ago - Pushed at: about 4 years ago - Stars: 191 - Forks: 44

IBM/spark-tpc-ds-performance-test 📦

Use the TPC-DS benchmark to test Spark SQL performance

Language: TSQL - Size: 354 MB - Last synced at: 14 days ago - Pushed at: about 5 years ago - Stars: 179 - Forks: 99

whylabs/whylogs-java 📦

Profile and monitor your ML data pipeline end-to-end

Language: Java - Size: 5.95 MB - Last synced at: 5 months ago - Pushed at: over 3 years ago - Stars: 178 - Forks: 7

vinta/albedo

A recommender system for discovering GitHub repos, built with Apache Spark

Language: Scala - Size: 442 KB - Last synced at: 3 months ago - Pushed at: about 5 years ago - Stars: 177 - Forks: 36

lamastex/scalable-data-science

Scalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.

Language: HTML - Size: 1.24 GB - Last synced at: 23 days ago - Pushed at: 5 months ago - Stars: 168 - Forks: 93

mahmoudparsian/big-data-mapreduce-course

Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University

Language: HTML - Size: 601 MB - Last synced at: 23 days ago - Pushed at: 7 months ago - Stars: 158 - Forks: 143

radanalyticsio/spark-operator

Operator for managing the Spark clusters on Kubernetes and OpenShift.

Language: Java - Size: 3.39 MB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 157 - Forks: 60

BitwiseInc/Hydrograph

A visual ETL development and debugging tool for big data

Language: Java - Size: 33.5 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 153 - Forks: 110

qubole/spark-on-lambda

Apache Spark on AWS Lambda

Language: Scala - Size: 111 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 151 - Forks: 33

SANSA-Stack/SANSA-Stack

Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/

Language: Scala - Size: 65.2 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 145 - Forks: 30

archivesunleashed/aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Language: Scala - Size: 39.5 MB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 144 - Forks: 33

gtkcyber/griffon-vm

Griffon Data Science Virtual Machine

Size: 896 KB - Last synced at: 2 months ago - Pushed at: almost 3 years ago - Stars: 132 - Forks: 26

GoogleCloudPlatform/dataproc-templates

Dataproc templates and pipelines for solving in-cloud data tasks

Language: Python - Size: 18.6 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 129 - Forks: 100

MemVerge/splash

Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange

Language: Scala - Size: 666 KB - Last synced at: 29 days ago - Pushed at: 6 months ago - Stars: 127 - Forks: 29

jleetutorial/scala-spark-tutorial

Project for James' Apache Spark with Scala course

Language: Scala - Size: 1.14 MB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 127 - Forks: 252

LearningJournal/Spark-Streaming-In-Python

Apache Spark 3 - Structured Streaming Course Material

Language: Python - Size: 19.4 MB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 121 - Forks: 159

zero323/pyspark-stubs 📦

Apache (Py)Spark type annotations (stub files).

Language: Python - Size: 1.3 MB - Last synced at: 24 days ago - Pushed at: almost 3 years ago - Stars: 117 - Forks: 37

streamnative/pulsar-spark

Spark Connector to read and write with Pulsar

Language: Scala - Size: 726 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 113 - Forks: 51

vivek-bombatkar/Spark-with-Python---My-learning-notes-

ETL pipeline using pyspark (Spark - Python)

Language: CSS - Size: 10.7 MB - Last synced at: 2 months ago - Pushed at: about 5 years ago - Stars: 113 - Forks: 82

G-Research/fasttrackml

Experiment tracking server focused on speed and scalability

Language: Go - Size: 5.4 MB - Last synced at: 14 days ago - Pushed at: 5 months ago - Stars: 105 - Forks: 20

jgperrin/net.jgp.books.spark.ch01

Spark in Action, 2nd edition - chapter 1 - Introduction

Language: Scala - Size: 6.91 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 103 - Forks: 71

exacaster/lighter

REST API for Apache Spark on K8S or YARN

Language: Java - Size: 6.68 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 98 - Forks: 23

dimajix/flowman

Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.

Language: Scala - Size: 18.6 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 94 - Forks: 19

adrianulbona/osm-parquetizer

A converter for the OSM PBFs to Parquet files

Language: Java - Size: 75.2 KB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 94 - Forks: 32

chermenin/spark-states

Custom state store providers for Apache Spark

Language: Scala - Size: 267 KB - Last synced at: 28 days ago - Pushed at: 4 months ago - Stars: 92 - Forks: 25

igor-suhorukov/openstreetmap_h3

OSM planet dump high performance data loader. Transform OpenStreetMap World/Region PBF dump into partitioned by H3 regions PostGIS pgsnapshot (lossless) OSM schema representation and/or into ArrowIPC/Parquet dumps

Language: Java - Size: 6.06 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 92 - Forks: 8

itsjafer/jupyterlab-sparkmonitor

JupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook

Language: JavaScript - Size: 4.08 MB - Last synced at: 8 days ago - Pushed at: over 2 years ago - Stars: 92 - Forks: 23

LearningJournal/SparkProgrammingInScala

Apache Spark Course Material

Language: Scala - Size: 50.9 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 88 - Forks: 159

kwartile/connected-component

Map Reduce Implementation of Connected Component on Apache Spark

Language: Scala - Size: 26.4 KB - Last synced at: 8 days ago - Pushed at: almost 4 years ago - Stars: 85 - Forks: 18

kakao/cuesheet 📦

A framework for writing Spark 2.x applications in a pretty way

Language: Scala - Size: 147 KB - Last synced at: 5 months ago - Pushed at: almost 2 years ago - Stars: 84 - Forks: 26

awesome-spark/learn-by-examples 📦

Real-world Spark pipelines examples

Language: Scala - Size: 1.1 MB - Last synced at: 11 days ago - Pushed at: over 7 years ago - Stars: 83 - Forks: 30

seznam/euphoria

Euphoria is an open source Java API for creating unified big-data processing flows. It provides an engine independent programming model which can express both batch and stream transformations.

Language: Java - Size: 3.9 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 81 - Forks: 11

streamnative/awesome-pulsar

A curated list of Pulsar tools, integrations and resources.

Size: 11.7 KB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 81 - Forks: 9

groda/big_data

Big Data essentials: Hadoop, MapReduce, Spark. Explore tutorials and demos in Jupyter notebooks—most are self-contained and live, ready to run with a click.

Language: Jupyter Notebook - Size: 51.9 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 78 - Forks: 27

IBM/kafka-streaming-click-analysis 📦

Use Kafka and Apache Spark streaming to perform click stream analytics

Language: Jupyter Notebook - Size: 583 KB - Last synced at: 14 days ago - Pushed at: over 5 years ago - Stars: 76 - Forks: 57

NashTech-Labs/Lambda-Arch-Spark

Language: Scala - Size: 23.4 KB - Last synced at: 11 days ago - Pushed at: almost 5 years ago - Stars: 75 - Forks: 37

swoop-inc/spark-records

Bulletproof Apache Spark jobs with fast root cause analysis of failures.

Language: Scala - Size: 604 KB - Last synced at: 2 months ago - Pushed at: over 4 years ago - Stars: 72 - Forks: 16