Topic: "apache-spark"
mlflow/mlflow
The open source developer platform to build AI/LLM applications and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integrated platform.
Language: Python - Size: 870 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 22,707 - Forks: 4,933
microsoft/SynapseML
Simple and Distributed Machine Learning
Language: Scala - Size: 166 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 5,170 - Forks: 851
treeverse/lakeFS
lakeFS - Data version control for your data lake | Git for data
Language: Go - Size: 160 MB - Last synced at: about 5 hours ago - Pushed at: about 4 hours ago - Stars: 4,954 - Forks: 402
lw-lin/CoolplaySpark
酷玩 Spark: Spark 源代码解析、Spark 类库等
Language: Scala - Size: 9.54 MB - Last synced at: 5 months ago - Pushed at: over 3 years ago - Stars: 3,488 - Forks: 1,410
spark-notebook/spark-notebook
Interactive and Reactive Data Science using Scala and Spark.
Language: JavaScript - Size: 15.8 MB - Last synced at: 2 days ago - Pushed at: over 2 years ago - Stars: 3,153 - Forks: 653
kubeflow/spark-operator
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Language: Go - Size: 25.8 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 3,051 - Forks: 1,434
intel/BigDL
BigDL: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray
Language: Jupyter Notebook - Size: 356 MB - Last synced at: 7 days ago - Pushed at: 19 days ago - Stars: 2,687 - Forks: 732
dotnet/spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Language: C# - Size: 4.88 MB - Last synced at: 16 days ago - Pushed at: about 1 month ago - Stars: 2,080 - Forks: 328
big-data-europe/docker-spark
Apache Spark docker image
Language: Shell - Size: 7.78 MB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 2,055 - Forks: 702
feathr-ai/feathr
Feathr – A scalable, unified data and AI engineering platform for enterprise
Language: Scala - Size: 29.4 MB - Last synced at: 16 days ago - Pushed at: over 1 year ago - Stars: 1,908 - Forks: 235
awesome-spark/awesome-spark
A curated list of awesome Apache Spark packages and resources.
Language: Shell - Size: 231 KB - Last synced at: 17 days ago - Pushed at: about 1 year ago - Stars: 1,833 - Forks: 340
OryxProject/oryx 📦
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
Language: Java - Size: 7.12 MB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 1,786 - Forks: 404
japila-books/apache-spark-internals
The Internals of Apache Spark
Size: 147 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1,505 - Forks: 458
ptyadana/SQL-Data-Analysis-and-Visualization-Projects
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
Language: Jupyter Notebook - Size: 35.1 MB - Last synced at: 5 months ago - Pushed at: over 3 years ago - Stars: 1,436 - Forks: 553
san089/goodreads_etl_pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Language: Python - Size: 1.31 MB - Last synced at: 5 months ago - Pushed at: over 5 years ago - Stars: 1,378 - Forks: 227
databricks/LearningSparkV2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Language: Scala - Size: 75.2 MB - Last synced at: 12 days ago - Pushed at: 9 months ago - Stars: 1,349 - Forks: 783
lensacom/sparkit-learn
PySpark + Scikit-learn = Sparkit-learn
Language: Python - Size: 444 KB - Last synced at: 6 days ago - Pushed at: almost 5 years ago - Stars: 1,154 - Forks: 256
graphframes/graphframes
GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs
Language: Scala - Size: 5.4 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1,086 - Forks: 254
databricks/spark-sklearn 📦
(Deprecated) Scikit-learn integration package for Apache Spark
Language: Python - Size: 782 KB - Last synced at: 12 days ago - Pushed at: almost 6 years ago - Stars: 1,077 - Forks: 228
mahmoudparsian/data-algorithms-book
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Language: Java - Size: 397 MB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 1,075 - Forks: 661
sparklyr/sparklyr
R interface for Apache Spark
Language: R - Size: 99 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 966 - Forks: 309
microsoft/Mobius
C# and F# language binding and extensions to Apache Spark
Language: C# - Size: 6.44 MB - Last synced at: 7 days ago - Pushed at: almost 2 years ago - Stars: 940 - Forks: 208
aloneguid/parquet-dotnet
Fully managed Apache Parquet implementation
Language: C# - Size: 122 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 796 - Forks: 172
LucaCanali/sparkMeasure
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
Language: Scala - Size: 1.92 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 788 - Forks: 157
lw-lin/streaming-readings
Streaming System 相关的论文读物
Size: 6.84 KB - Last synced at: 5 months ago - Pushed at: over 3 years ago - Stars: 733 - Forks: 154
miguno/kafka-storm-starter 📦
[PROJECT IS NO LONGER MAINTAINED] Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Language: Scala - Size: 393 KB - Last synced at: 23 days ago - Pushed at: over 3 years ago - Stars: 724 - Forks: 329
mrpowers-io/quinn
pyspark methods to enhance developer productivity 📣 👯 🎉
Language: Python - Size: 1.98 MB - Last synced at: 14 days ago - Pushed at: 8 months ago - Stars: 674 - Forks: 99
nchammas/flintrock
A command-line tool for launching Apache Spark clusters.
Language: Python - Size: 785 KB - Last synced at: 19 days ago - Pushed at: 11 months ago - Stars: 648 - Forks: 117
cerndb/dist-keras 📦
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Language: Python - Size: 54.6 MB - Last synced at: about 1 month ago - Pushed at: over 7 years ago - Stars: 623 - Forks: 167
apache-spark-on-k8s/spark Fork of apache/spark 📦
Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
Language: Scala - Size: 260 MB - Last synced at: about 1 month ago - Pushed at: almost 6 years ago - Stars: 613 - Forks: 117
openscoring/openscoring
REST web service for the true real-time scoring (<1 ms) of Scikit-Learn, R and Apache Spark models
Language: Java - Size: 869 KB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 583 - Forks: 172
infoslack/awesome-kafka
A list about Apache Kafka
Size: 96.7 KB - Last synced at: 11 days ago - Pushed at: 8 months ago - Stars: 582 - Forks: 166
japila-books/spark-sql-internals
The Internals of Spark SQL
Size: 1.57 GB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 477 - Forks: 136
rjurney/Agile_Data_Code_2
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Language: Jupyter Notebook - Size: 23.2 MB - Last synced at: 20 days ago - Pushed at: over 1 year ago - Stars: 461 - Forks: 310
LucaCanali/Miscellaneous
Includes notes on using Apache Spark, with drill down on Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark. Also tools for stress testing, measuring CPUs' performance, and I/O latency heat maps. Jupyter notebooks examples for using various DB systems.
Language: Jupyter Notebook - Size: 35.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 455 - Forks: 154
cartershanklin/pyspark-cheatsheet
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Language: Python - Size: 16.8 MB - Last synced at: 7 months ago - Pushed at: about 1 year ago - Stars: 451 - Forks: 220
tweag/sparkle
Haskell on Apache Spark.
Language: Haskell - Size: 1.11 MB - Last synced at: 27 days ago - Pushed at: 3 months ago - Stars: 448 - Forks: 27
1duo/awesome-ai-infrastructures
Infrastructures™ for Machine Learning Training/Inference in Production.
Size: 11.8 MB - Last synced at: 17 days ago - Pushed at: over 6 years ago - Stars: 427 - Forks: 75
japila-books/spark-structured-streaming-internals
The Internals of Spark Structured Streaming
Size: 119 MB - Last synced at: 5 months ago - Pushed at: almost 3 years ago - Stars: 420 - Forks: 172
ekampf/PySpark-Boilerplate
A boilerplate for writing PySpark Jobs
Language: Python - Size: 10.7 KB - Last synced at: 5 months ago - Pushed at: almost 2 years ago - Stars: 396 - Forks: 155
awesome-spark/spark-gotchas 📦
Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
Size: 188 KB - Last synced at: 7 days ago - Pushed at: over 8 years ago - Stars: 364 - Forks: 79
tirthajyoti/Spark-with-Python
Fundamentals of Spark with Python (using PySpark), code examples
Language: Jupyter Notebook - Size: 8.97 MB - Last synced at: 5 months ago - Pushed at: about 3 years ago - Stars: 347 - Forks: 271
datamechanics/delight 📦
A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.
Language: Scala - Size: 2.31 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 346 - Forks: 55
opencypher/morpheus
Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.
Language: Scala - Size: 29.6 MB - Last synced at: 26 days ago - Pushed at: about 5 years ago - Stars: 341 - Forks: 62
dmmiller612/sparktorch
Train and run Pytorch models on Apache Spark.
Language: Python - Size: 8.83 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 340 - Forks: 46
miguno/wirbelsturm 📦
[PROJECT IS NO LONGER MAINTAINED] Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
Language: Shell - Size: 309 KB - Last synced at: 23 days ago - Pushed at: over 3 years ago - Stars: 329 - Forks: 72
Hydrospheredata/mist
Serverless proxy for Spark cluster
Language: Scala - Size: 9.96 MB - Last synced at: 5 months ago - Pushed at: about 5 years ago - Stars: 326 - Forks: 72
dataflint/spark
Drop-in replacement for Apache Spark UI
Language: TypeScript - Size: 19 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 318 - Forks: 41
microsoft/data-accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Language: C# - Size: 401 MB - Last synced at: 8 days ago - Pushed at: 7 months ago - Stars: 306 - Forks: 91
josephmachado/efficient_data_processing_spark
Code for "Efficient Data Processing in Spark" Course
Language: Python - Size: 23.9 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 304 - Forks: 63
MingChen0919/learning-apache-spark
Notes on Apache Spark (pyspark)
Language: HTML - Size: 20.1 MB - Last synced at: 5 months ago - Pushed at: over 6 years ago - Stars: 299 - Forks: 186
lifeomic/sparkflow
Easy to use library to bring Tensorflow on Apache Spark
Language: Python - Size: 8.79 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 296 - Forks: 45
cuebook/cuelake
Use SQL to build ELT pipelines on a data lakehouse.
Language: JavaScript - Size: 28 MB - Last synced at: 6 months ago - Pushed at: over 3 years ago - Stars: 288 - Forks: 28
svenkreiss/pysparkling
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Language: Python - Size: 3.45 MB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 271 - Forks: 45
hortonworks-spark/spark-atlas-connector
A Spark Atlas connector to track data lineage in Apache Atlas
Language: Scala - Size: 903 KB - Last synced at: 6 days ago - Pushed at: almost 3 years ago - Stars: 266 - Forks: 149
jaceklaskowski/spark-workshop
Apache Spark™ and Scala Workshops
Language: HTML - Size: 57 MB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 264 - Forks: 148
PiercingDan/spark-Jupyter-AWS
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Language: Jupyter Notebook - Size: 220 KB - Last synced at: 6 months ago - Pushed at: almost 8 years ago - Stars: 261 - Forks: 18
Mellanox/SparkRDMA 📦
This is archive of SparkRDMA project. The new repository with RDMA shuffle acceleration for Apache Spark is here: https://github.com/Nvidia/sparkucx
Language: Java - Size: 259 KB - Last synced at: about 1 month ago - Pushed at: over 6 years ago - Stars: 253 - Forks: 73
airscholar/e2e-data-engineering
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.
Language: Python - Size: 289 KB - Last synced at: 5 months ago - Pushed at: 9 months ago - Stars: 250 - Forks: 123
Azure/azure-event-hubs-spark
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Language: Scala - Size: 19.6 MB - Last synced at: 6 days ago - Pushed at: 9 months ago - Stars: 238 - Forks: 179
Chabane/bigdata-playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Language: TypeScript - Size: 3.08 MB - Last synced at: 6 months ago - Pushed at: almost 7 years ago - Stars: 209 - Forks: 74
Azure/azure-cosmosdb-spark 📦
Apache Spark Connector for Azure Cosmos DB
Size: 192 MB - Last synced at: 6 days ago - Pushed at: 8 months ago - Stars: 203 - Forks: 120
lynnlangit/learning-hadoop-and-spark
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Language: HTML - Size: 13.6 MB - Last synced at: 28 days ago - Pushed at: 11 months ago - Stars: 202 - Forks: 166
databrickslabs/automl-toolkit 📦
Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.
Language: HTML - Size: 158 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 193 - Forks: 44
vinta/albedo
A recommender system for discovering GitHub repos, built with Apache Spark
Language: Scala - Size: 448 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 181 - Forks: 36
IBM/spark-tpc-ds-performance-test 📦
Use the TPC-DS benchmark to test Spark SQL performance
Language: TSQL - Size: 354 MB - Last synced at: about 1 month ago - Pushed at: over 5 years ago - Stars: 181 - Forks: 98
whylabs/whylogs-java 📦
Profile and monitor your ML data pipeline end-to-end
Language: Java - Size: 5.95 MB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 178 - Forks: 7
lamastex/scalable-data-science
Scalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.
Language: HTML - Size: 1.24 GB - Last synced at: 5 months ago - Pushed at: 9 months ago - Stars: 168 - Forks: 93
mahmoudparsian/big-data-mapreduce-course
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Language: HTML - Size: 614 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 161 - Forks: 143
radanalyticsio/spark-operator
Operator for managing the Spark clusters on Kubernetes and OpenShift.
Language: Java - Size: 3.39 MB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 158 - Forks: 62
BitwiseInc/Hydrograph
A visual ETL development and debugging tool for big data
Language: Java - Size: 33.5 MB - Last synced at: 9 days ago - Pushed at: almost 3 years ago - Stars: 154 - Forks: 108
qubole/spark-on-lambda
Apache Spark on AWS Lambda
Language: Scala - Size: 111 MB - Last synced at: 6 months ago - Pushed at: almost 3 years ago - Stars: 151 - Forks: 33
archivesunleashed/aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Language: Scala - Size: 39.5 MB - Last synced at: about 24 hours ago - Pushed at: over 1 year ago - Stars: 147 - Forks: 34
SANSA-Stack/SANSA-Stack
Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
Language: Scala - Size: 65.2 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 146 - Forks: 30
GoogleCloudPlatform/dataproc-templates
Dataproc templates and pipelines for solving in-cloud data tasks
Language: Python - Size: 18.8 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 135 - Forks: 106
gtkcyber/griffon-vm
Griffon Data Science Virtual Machine
Size: 896 KB - Last synced at: 4 days ago - Pushed at: over 3 years ago - Stars: 132 - Forks: 26
MemVerge/splash
Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Language: Scala - Size: 666 KB - Last synced at: 2 months ago - Pushed at: 11 months ago - Stars: 128 - Forks: 29
jleetutorial/scala-spark-tutorial
Project for James' Apache Spark with Scala course
Language: Scala - Size: 1.14 MB - Last synced at: 6 months ago - Pushed at: over 5 years ago - Stars: 127 - Forks: 252
LearningJournal/Spark-Streaming-In-Python
Apache Spark 3 - Structured Streaming Course Material
Language: Python - Size: 19.4 MB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 122 - Forks: 164
zero323/pyspark-stubs 📦
Apache (Py)Spark type annotations (stub files).
Language: Python - Size: 1.3 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 117 - Forks: 37
streamnative/pulsar-spark
Spark Connector to read and write with Pulsar
Language: Scala - Size: 722 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 116 - Forks: 53
vivek-bombatkar/Spark-with-Python---My-learning-notes-
ETL pipeline using pyspark (Spark - Python)
Language: CSS - Size: 10.7 MB - Last synced at: 4 days ago - Pushed at: over 5 years ago - Stars: 116 - Forks: 82
G-Research/fasttrackml
Experiment tracking server focused on speed and scalability
Language: Go - Size: 5.4 MB - Last synced at: 2 months ago - Pushed at: 9 months ago - Stars: 111 - Forks: 20
jgperrin/net.jgp.books.spark.ch01
Spark in Action, 2nd edition - chapter 1 - Introduction
Language: Scala - Size: 6.91 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 106 - Forks: 69
exacaster/lighter
REST API for Apache Spark on K8S or YARN
Language: Java - Size: 6 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 100 - Forks: 25
dimajix/flowman
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
Language: Scala - Size: 18.3 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 96 - Forks: 19
igor-suhorukov/openstreetmap_h3
OSM planet dump high performance data loader. Transform OpenStreetMap World/Region PBF dump into partitioned by H3 regions PostGIS pgsnapshot (lossless) OSM schema representation and/or into ArrowIPC/Parquet dumps
Language: Java - Size: 6.01 MB - Last synced at: 28 days ago - Pushed at: 3 months ago - Stars: 95 - Forks: 9
adrianulbona/osm-parquetizer
A converter for the OSM PBFs to Parquet files
Language: Java - Size: 75.2 KB - Last synced at: 4 days ago - Pushed at: about 2 years ago - Stars: 95 - Forks: 33
chermenin/spark-states
Custom state store providers for Apache Spark
Language: Scala - Size: 267 KB - Last synced at: 5 months ago - Pushed at: 9 months ago - Stars: 92 - Forks: 25
itsjafer/jupyterlab-sparkmonitor
JupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook
Language: JavaScript - Size: 4.08 MB - Last synced at: 5 days ago - Pushed at: almost 3 years ago - Stars: 92 - Forks: 23
LearningJournal/SparkProgrammingInScala
Apache Spark Course Material
Language: Scala - Size: 50.9 MB - Last synced at: 7 months ago - Pushed at: over 2 years ago - Stars: 88 - Forks: 159
groda/big_data
Big Data essentials: Hadoop, MapReduce, Spark. Explore tutorials and demos in Jupyter notebooks—most are self-contained and live, ready to run with a click.
Language: Jupyter Notebook - Size: 62.5 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 85 - Forks: 27
kwartile/connected-component
Map Reduce Implementation of Connected Component on Apache Spark
Language: Scala - Size: 26.4 KB - Last synced at: 3 months ago - Pushed at: about 4 years ago - Stars: 85 - Forks: 18
streamnative/awesome-pulsar
A curated list of Pulsar tools, integrations and resources.
Size: 11.7 KB - Last synced at: 4 days ago - Pushed at: almost 5 years ago - Stars: 85 - Forks: 9
kakao/cuesheet 📦
A framework for writing Spark 2.x applications in a pretty way
Language: Scala - Size: 147 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 83 - Forks: 26
seznam/euphoria
Euphoria is an open source Java API for creating unified big-data processing flows. It provides an engine independent programming model which can express both batch and stream transformations.
Language: Java - Size: 3.9 MB - Last synced at: 2 months ago - Pushed at: almost 3 years ago - Stars: 83 - Forks: 8
awesome-spark/learn-by-examples 📦
Real-world Spark pipelines examples
Language: Scala - Size: 1.1 MB - Last synced at: 7 days ago - Pushed at: over 7 years ago - Stars: 83 - Forks: 28
kubeflow/mcp-apache-spark-history-server
MCP Server for Apache Spark History Server. The bridge between Agentic AI and Apache Spark.
Language: Python - Size: 2.32 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 82 - Forks: 21
IBM/kafka-streaming-click-analysis 📦
Use Kafka and Apache Spark streaming to perform click stream analytics
Language: Jupyter Notebook - Size: 583 KB - Last synced at: about 1 month ago - Pushed at: over 5 years ago - Stars: 76 - Forks: 57
NashTech-Labs/Lambda-Arch-Spark
Language: Scala - Size: 23.4 KB - Last synced at: 4 days ago - Pushed at: over 5 years ago - Stars: 75 - Forks: 37