Topic: "apache-spark"
mlflow/mlflow
Open source platform for the machine learning lifecycle
Language: Python - Size: 796 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 20,908 - Forks: 4,609

microsoft/SynapseML
Simple and Distributed Machine Learning
Language: Scala - Size: 157 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 5,139 - Forks: 845

treeverse/lakeFS
lakeFS - Data version control for your data lake | Git for data
Language: Go - Size: 150 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 4,723 - Forks: 380

lw-lin/CoolplaySpark
酷玩 Spark: Spark 源代码解析、Spark 类库等
Language: Scala - Size: 9.54 MB - Last synced at: 25 days ago - Pushed at: about 3 years ago - Stars: 3,488 - Forks: 1,410

spark-notebook/spark-notebook
Interactive and Reactive Data Science using Scala and Spark.
Language: JavaScript - Size: 15.8 MB - Last synced at: 25 days ago - Pushed at: about 2 years ago - Stars: 3,152 - Forks: 653

kubeflow/spark-operator
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Language: Go - Size: 25.5 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2,952 - Forks: 1,411

intel/BigDL
BigDL: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray
Language: Jupyter Notebook - Size: 356 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 2,674 - Forks: 731

dotnet/spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Language: C# - Size: 4.87 MB - Last synced at: 21 days ago - Pushed at: about 1 month ago - Stars: 2,058 - Forks: 327

big-data-europe/docker-spark
Apache Spark docker image
Language: Shell - Size: 7.78 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 2,055 - Forks: 702

feathr-ai/feathr
Feathr – A scalable, unified data and AI engineering platform for enterprise
Language: Scala - Size: 29.4 MB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 1,898 - Forks: 232

awesome-spark/awesome-spark
A curated list of awesome Apache Spark packages and resources.
Language: Shell - Size: 231 KB - Last synced at: 2 days ago - Pushed at: 8 months ago - Stars: 1,802 - Forks: 340

OryxProject/oryx 📦
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
Language: Java - Size: 7.12 MB - Last synced at: 4 months ago - Pushed at: almost 4 years ago - Stars: 1,781 - Forks: 404

japila-books/apache-spark-internals
The Internals of Apache Spark
Size: 145 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 1,500 - Forks: 459

ptyadana/SQL-Data-Analysis-and-Visualization-Projects
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
Language: Jupyter Notebook - Size: 35.1 MB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 1,436 - Forks: 553

san089/goodreads_etl_pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Language: Python - Size: 1.31 MB - Last synced at: about 1 month ago - Pushed at: over 5 years ago - Stars: 1,378 - Forks: 227

databricks/LearningSparkV2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Language: Scala - Size: 75.2 MB - Last synced at: about 19 hours ago - Pushed at: 5 months ago - Stars: 1,306 - Forks: 775

lensacom/sparkit-learn
PySpark + Scikit-learn = Sparkit-learn
Language: Python - Size: 444 KB - Last synced at: 25 days ago - Pushed at: over 4 years ago - Stars: 1,154 - Forks: 256

databricks/spark-sklearn 📦
(Deprecated) Scikit-learn integration package for Apache Spark
Language: Python - Size: 782 KB - Last synced at: about 19 hours ago - Pushed at: over 5 years ago - Stars: 1,079 - Forks: 228

mahmoudparsian/data-algorithms-book
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Language: Java - Size: 397 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 1,075 - Forks: 661

graphframes/graphframes
GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs
Language: Scala - Size: 3.85 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1,059 - Forks: 250

sparklyr/sparklyr
R interface for Apache Spark
Language: R - Size: 97 MB - Last synced at: 8 days ago - Pushed at: 3 months ago - Stars: 962 - Forks: 308

microsoft/Mobius
C# and F# language binding and extensions to Apache Spark
Language: C# - Size: 6.44 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 940 - Forks: 211

LucaCanali/sparkMeasure
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
Language: Scala - Size: 1.96 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 758 - Forks: 151

lw-lin/streaming-readings
Streaming System 相关的论文读物
Size: 6.84 KB - Last synced at: 30 days ago - Pushed at: over 3 years ago - Stars: 733 - Forks: 154

aloneguid/parquet-dotnet
Fully managed Apache Parquet implementation
Language: C# - Size: 121 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 732 - Forks: 160

miguno/kafka-storm-starter 📦
[PROJECT IS NO LONGER MAINTAINED] Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Language: Scala - Size: 393 KB - Last synced at: 3 days ago - Pushed at: over 3 years ago - Stars: 724 - Forks: 329

mrpowers-io/quinn
pyspark methods to enhance developer productivity 📣 👯 🎉
Language: Python - Size: 1.98 MB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 672 - Forks: 98

nchammas/flintrock
A command-line tool for launching Apache Spark clusters.
Language: Python - Size: 785 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 642 - Forks: 117

cerndb/dist-keras 📦
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Language: Python - Size: 54.6 MB - Last synced at: 2 days ago - Pushed at: almost 7 years ago - Stars: 623 - Forks: 167

apache-spark-on-k8s/spark Fork of apache/spark 📦
Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
Language: Scala - Size: 260 MB - Last synced at: 5 months ago - Pushed at: over 5 years ago - Stars: 612 - Forks: 118

openscoring/openscoring
REST web service for the true real-time scoring (<1 ms) of Scikit-Learn, R and Apache Spark models
Language: Java - Size: 869 KB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 583 - Forks: 171

infoslack/awesome-kafka
A list about Apache Kafka
Size: 96.7 KB - Last synced at: 2 days ago - Pushed at: 3 months ago - Stars: 579 - Forks: 164

japila-books/spark-sql-internals
The Internals of Spark SQL
Size: 1.46 GB - Last synced at: 24 days ago - Pushed at: 5 months ago - Stars: 466 - Forks: 132

rjurney/Agile_Data_Code_2
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Language: Jupyter Notebook - Size: 23.2 MB - Last synced at: 2 days ago - Pushed at: about 1 year ago - Stars: 460 - Forks: 310

cartershanklin/pyspark-cheatsheet
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Language: Python - Size: 16.8 MB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 451 - Forks: 220

LucaCanali/Miscellaneous
Includes notes on using Apache Spark, with drill down on Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark. Also tools for stress testing and measuring CPUs's performance. Jupyter notebooks examples for using various DB systems.
Language: Jupyter Notebook - Size: 34.3 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 449 - Forks: 152

tweag/sparkle
Haskell on Apache Spark.
Language: Haskell - Size: 1.1 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 448 - Forks: 27

japila-books/spark-structured-streaming-internals
The Internals of Spark Structured Streaming
Size: 119 MB - Last synced at: 29 days ago - Pushed at: over 2 years ago - Stars: 420 - Forks: 172

1duo/awesome-ai-infrastructures
Infrastructures™ for Machine Learning Training/Inference in Production.
Size: 11.8 MB - Last synced at: about 1 month ago - Pushed at: about 6 years ago - Stars: 416 - Forks: 74

ekampf/PySpark-Boilerplate
A boilerplate for writing PySpark Jobs
Language: Python - Size: 10.7 KB - Last synced at: 29 days ago - Pushed at: over 1 year ago - Stars: 396 - Forks: 155

awesome-spark/spark-gotchas 📦
Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
Size: 188 KB - Last synced at: about 1 month ago - Pushed at: about 8 years ago - Stars: 363 - Forks: 80

tirthajyoti/Spark-with-Python
Fundamentals of Spark with Python (using PySpark), code examples
Language: Jupyter Notebook - Size: 8.97 MB - Last synced at: 29 days ago - Pushed at: over 2 years ago - Stars: 347 - Forks: 271

datamechanics/delight 📦
A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.
Language: Scala - Size: 2.31 MB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 345 - Forks: 53

opencypher/morpheus
Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.
Language: Scala - Size: 29.6 MB - Last synced at: 11 days ago - Pushed at: almost 5 years ago - Stars: 341 - Forks: 62

dmmiller612/sparktorch
Train and run Pytorch models on Apache Spark.
Language: Python - Size: 8.83 MB - Last synced at: 29 days ago - Pushed at: about 2 years ago - Stars: 339 - Forks: 46

miguno/wirbelsturm 📦
[PROJECT IS NO LONGER MAINTAINED] Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
Language: Shell - Size: 309 KB - Last synced at: 3 days ago - Pushed at: over 3 years ago - Stars: 328 - Forks: 72

Hydrospheredata/mist
Serverless proxy for Spark cluster
Language: Scala - Size: 9.96 MB - Last synced at: 29 days ago - Pushed at: over 4 years ago - Stars: 326 - Forks: 72

josephmachado/efficient_data_processing_spark
Code for "Efficient Data Processing in Spark" Course
Language: Python - Size: 23.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 304 - Forks: 63

microsoft/data-accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Language: C# - Size: 401 MB - Last synced at: 4 days ago - Pushed at: 3 months ago - Stars: 302 - Forks: 90

MingChen0919/learning-apache-spark
Notes on Apache Spark (pyspark)
Language: HTML - Size: 20.1 MB - Last synced at: 28 days ago - Pushed at: over 6 years ago - Stars: 299 - Forks: 186

lifeomic/sparkflow
Easy to use library to bring Tensorflow on Apache Spark
Language: Python - Size: 8.79 MB - Last synced at: 29 days ago - Pushed at: over 1 year ago - Stars: 296 - Forks: 45

cuebook/cuelake
Use SQL to build ELT pipelines on a data lakehouse.
Language: JavaScript - Size: 28 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 288 - Forks: 28

svenkreiss/pysparkling
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Language: Python - Size: 3.45 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 269 - Forks: 44

hortonworks-spark/spark-atlas-connector
A Spark Atlas connector to track data lineage in Apache Atlas
Language: Scala - Size: 903 KB - Last synced at: 28 days ago - Pushed at: over 2 years ago - Stars: 266 - Forks: 150

jaceklaskowski/spark-workshop
Apache Spark™ and Scala Workshops
Language: HTML - Size: 57 MB - Last synced at: 29 days ago - Pushed at: 11 months ago - Stars: 264 - Forks: 148

PiercingDan/spark-Jupyter-AWS
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Language: Jupyter Notebook - Size: 220 KB - Last synced at: about 1 month ago - Pushed at: over 7 years ago - Stars: 261 - Forks: 18

dataflint/spark
A modern replacement Apache Spark UI
Language: TypeScript - Size: 17.8 MB - Last synced at: 19 days ago - Pushed at: 20 days ago - Stars: 254 - Forks: 30

airscholar/e2e-data-engineering
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.
Language: Python - Size: 289 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 250 - Forks: 123

Mellanox/SparkRDMA 📦
This is archive of SparkRDMA project. The new repository with RDMA shuffle acceleration for Apache Spark is here: https://github.com/Nvidia/sparkucx
Language: Java - Size: 259 KB - Last synced at: 5 months ago - Pushed at: about 6 years ago - Stars: 242 - Forks: 71

Azure/azure-event-hubs-spark
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Language: Scala - Size: 19.6 MB - Last synced at: 7 days ago - Pushed at: 4 months ago - Stars: 236 - Forks: 178

Chabane/bigdata-playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Language: TypeScript - Size: 3.08 MB - Last synced at: about 2 months ago - Pushed at: over 6 years ago - Stars: 209 - Forks: 74

Azure/azure-cosmosdb-spark 📦
Apache Spark Connector for Azure Cosmos DB
Size: 192 MB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 203 - Forks: 120

lynnlangit/learning-hadoop-and-spark
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Language: HTML - Size: 13.6 MB - Last synced at: 4 days ago - Pushed at: 6 months ago - Stars: 195 - Forks: 165

databrickslabs/automl-toolkit 📦
Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.
Language: HTML - Size: 158 MB - Last synced at: 3 months ago - Pushed at: about 4 years ago - Stars: 191 - Forks: 44

IBM/spark-tpc-ds-performance-test 📦
Use the TPC-DS benchmark to test Spark SQL performance
Language: TSQL - Size: 354 MB - Last synced at: 14 days ago - Pushed at: about 5 years ago - Stars: 179 - Forks: 99

whylabs/whylogs-java 📦
Profile and monitor your ML data pipeline end-to-end
Language: Java - Size: 5.95 MB - Last synced at: 5 months ago - Pushed at: over 3 years ago - Stars: 178 - Forks: 7

vinta/albedo
A recommender system for discovering GitHub repos, built with Apache Spark
Language: Scala - Size: 442 KB - Last synced at: 3 months ago - Pushed at: about 5 years ago - Stars: 177 - Forks: 36

lamastex/scalable-data-science
Scalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.
Language: HTML - Size: 1.24 GB - Last synced at: 23 days ago - Pushed at: 5 months ago - Stars: 168 - Forks: 93

mahmoudparsian/big-data-mapreduce-course
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Language: HTML - Size: 601 MB - Last synced at: 23 days ago - Pushed at: 7 months ago - Stars: 158 - Forks: 143

radanalyticsio/spark-operator
Operator for managing the Spark clusters on Kubernetes and OpenShift.
Language: Java - Size: 3.39 MB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 157 - Forks: 60

BitwiseInc/Hydrograph
A visual ETL development and debugging tool for big data
Language: Java - Size: 33.5 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 153 - Forks: 110

qubole/spark-on-lambda
Apache Spark on AWS Lambda
Language: Scala - Size: 111 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 151 - Forks: 33

SANSA-Stack/SANSA-Stack
Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
Language: Scala - Size: 65.2 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 145 - Forks: 30

archivesunleashed/aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Language: Scala - Size: 39.5 MB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 144 - Forks: 33

gtkcyber/griffon-vm
Griffon Data Science Virtual Machine
Size: 896 KB - Last synced at: 2 months ago - Pushed at: almost 3 years ago - Stars: 132 - Forks: 26

GoogleCloudPlatform/dataproc-templates
Dataproc templates and pipelines for solving in-cloud data tasks
Language: Python - Size: 18.6 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 129 - Forks: 100

MemVerge/splash
Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Language: Scala - Size: 666 KB - Last synced at: 29 days ago - Pushed at: 6 months ago - Stars: 127 - Forks: 29

jleetutorial/scala-spark-tutorial
Project for James' Apache Spark with Scala course
Language: Scala - Size: 1.14 MB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 127 - Forks: 252

LearningJournal/Spark-Streaming-In-Python
Apache Spark 3 - Structured Streaming Course Material
Language: Python - Size: 19.4 MB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 121 - Forks: 159

zero323/pyspark-stubs 📦
Apache (Py)Spark type annotations (stub files).
Language: Python - Size: 1.3 MB - Last synced at: 24 days ago - Pushed at: almost 3 years ago - Stars: 117 - Forks: 37

streamnative/pulsar-spark
Spark Connector to read and write with Pulsar
Language: Scala - Size: 726 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 113 - Forks: 51

vivek-bombatkar/Spark-with-Python---My-learning-notes-
ETL pipeline using pyspark (Spark - Python)
Language: CSS - Size: 10.7 MB - Last synced at: 2 months ago - Pushed at: about 5 years ago - Stars: 113 - Forks: 82

G-Research/fasttrackml
Experiment tracking server focused on speed and scalability
Language: Go - Size: 5.4 MB - Last synced at: 14 days ago - Pushed at: 5 months ago - Stars: 105 - Forks: 20

jgperrin/net.jgp.books.spark.ch01
Spark in Action, 2nd edition - chapter 1 - Introduction
Language: Scala - Size: 6.91 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 103 - Forks: 71

exacaster/lighter
REST API for Apache Spark on K8S or YARN
Language: Java - Size: 6.68 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 98 - Forks: 23

dimajix/flowman
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
Language: Scala - Size: 18.6 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 94 - Forks: 19

adrianulbona/osm-parquetizer
A converter for the OSM PBFs to Parquet files
Language: Java - Size: 75.2 KB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 94 - Forks: 32

chermenin/spark-states
Custom state store providers for Apache Spark
Language: Scala - Size: 267 KB - Last synced at: 28 days ago - Pushed at: 4 months ago - Stars: 92 - Forks: 25

igor-suhorukov/openstreetmap_h3
OSM planet dump high performance data loader. Transform OpenStreetMap World/Region PBF dump into partitioned by H3 regions PostGIS pgsnapshot (lossless) OSM schema representation and/or into ArrowIPC/Parquet dumps
Language: Java - Size: 6.06 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 92 - Forks: 8

itsjafer/jupyterlab-sparkmonitor
JupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook
Language: JavaScript - Size: 4.08 MB - Last synced at: 8 days ago - Pushed at: over 2 years ago - Stars: 92 - Forks: 23

LearningJournal/SparkProgrammingInScala
Apache Spark Course Material
Language: Scala - Size: 50.9 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 88 - Forks: 159

kwartile/connected-component
Map Reduce Implementation of Connected Component on Apache Spark
Language: Scala - Size: 26.4 KB - Last synced at: 8 days ago - Pushed at: almost 4 years ago - Stars: 85 - Forks: 18

kakao/cuesheet 📦
A framework for writing Spark 2.x applications in a pretty way
Language: Scala - Size: 147 KB - Last synced at: 5 months ago - Pushed at: almost 2 years ago - Stars: 84 - Forks: 26

awesome-spark/learn-by-examples 📦
Real-world Spark pipelines examples
Language: Scala - Size: 1.1 MB - Last synced at: 11 days ago - Pushed at: over 7 years ago - Stars: 83 - Forks: 30

seznam/euphoria
Euphoria is an open source Java API for creating unified big-data processing flows. It provides an engine independent programming model which can express both batch and stream transformations.
Language: Java - Size: 3.9 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 81 - Forks: 11

streamnative/awesome-pulsar
A curated list of Pulsar tools, integrations and resources.
Size: 11.7 KB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 81 - Forks: 9

groda/big_data
Big Data essentials: Hadoop, MapReduce, Spark. Explore tutorials and demos in Jupyter notebooks—most are self-contained and live, ready to run with a click.
Language: Jupyter Notebook - Size: 51.9 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 78 - Forks: 27

IBM/kafka-streaming-click-analysis 📦
Use Kafka and Apache Spark streaming to perform click stream analytics
Language: Jupyter Notebook - Size: 583 KB - Last synced at: 14 days ago - Pushed at: over 5 years ago - Stars: 76 - Forks: 57

NashTech-Labs/Lambda-Arch-Spark
Language: Scala - Size: 23.4 KB - Last synced at: 11 days ago - Pushed at: almost 5 years ago - Stars: 75 - Forks: 37

swoop-inc/spark-records
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
Language: Scala - Size: 604 KB - Last synced at: 2 months ago - Pushed at: over 4 years ago - Stars: 72 - Forks: 16
