GitHub topics: structured-streaming
streamnative/pulsar-spark
Spark Connector to read and write with Pulsar
Language: Scala - Size: 712 KB - Last synced at: 1 day ago - Pushed at: 4 days ago - Stars: 113 - Forks: 50

databricks/LearningSparkV2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Language: Scala - Size: 75.2 MB - Last synced at: 2 days ago - Pushed at: 4 months ago - Stars: 1,294 - Forks: 770

lw-lin/CoolplaySpark
酷玩 Spark: Spark 源代码解析、Spark 类库等
Language: Scala - Size: 9.54 MB - Last synced at: 3 days ago - Pushed at: almost 3 years ago - Stars: 3,485 - Forks: 1,410

igopalakrishna/nyc-subway-foot-traffic-prediction-and-forecasting
Designed and implemented a scalable real-time analytics pipeline using Apache Kafka, Spark Structured Streaming, and MongoDB to simulate NYC MTA turnstile data and forecast real-time subway foot traffic using SparkML Random Forest models.
Language: Jupyter Notebook - Size: 1.27 MB - Last synced at: 3 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 1

astrolabsoftware/fink-broker
Astronomy Broker based on Apache Spark
Language: Python - Size: 98.5 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 70 - Forks: 14

astrolabsoftware/fink
Fink documentation website
Size: 41.9 MB - Last synced at: 18 days ago - Pushed at: 26 days ago - Stars: 3 - Forks: 2

AndGeo69/StreamingCotiles
A streaming implementation of COTILES algorithm using Apache Spark's Structured Streaming API
Language: Python - Size: 2.55 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

aws-samples/iceberg-streaming-examples
This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenarios using best practices. The code can be deployed into any Spark compatible engine like Amazon EMR Serverless or AWS Glue. A fully local developer environment is also provided.
Language: Java - Size: 443 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 23 - Forks: 4

japila-books/spark-structured-streaming-internals
The Internals of Spark Structured Streaming
Size: 119 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 417 - Forks: 172

chermenin/spark-states
Custom state store providers for Apache Spark
Language: Scala - Size: 267 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 92 - Forks: 26

sankamuk/PysparkCheatsheet
PySpark Cheatsheet
Language: Python - Size: 11.2 MB - Last synced at: 12 days ago - Pushed at: over 2 years ago - Stars: 36 - Forks: 27

polomarcus/Spark-Structured-Streaming-Examples
Spark Structured Streaming / Kafka / Cassandra / Elastic
Language: Scala - Size: 16.5 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 183 - Forks: 78

Azure/azure-event-hubs-spark
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Language: Scala - Size: 19.6 MB - Last synced at: about 23 hours ago - Pushed at: 3 months ago - Stars: 235 - Forks: 177

qubole/kinesis-sql
Kinesis Connector for Structured Streaming
Language: Scala - Size: 251 KB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 136 - Forks: 80

abdheshkumar/spark-practices
Spark, Spark Streaming and Kafka Streaming examples
Language: Scala - Size: 61.9 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 5 - Forks: 0

yjshen/spark-connector-test
A tutorial on how to use pulsar-spark-connector
Language: Scala - Size: 12.7 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 11 - Forks: 3

aamend/spark-gdelt
Binding the GDELT universe in a Spark environment
Language: Scala - Size: 5.61 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 23 - Forks: 10

LeoneGarage/StreamJoin
A framework for incremental streaming joins and incremental streaming aggregations over change data feeds from Databricks Delta
Language: Python - Size: 196 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 3 - Forks: 2

qubole/spark-state-store
Rocksdb state storage implementation for Structured Streaming.
Language: Scala - Size: 56.6 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 17 - Forks: 8

HeartSaVioR/spark-sql-kafka-offset-committer
Kafka offset committer for structured streaming query
Language: Scala - Size: 89.8 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 37 - Forks: 15

mozilla/telemetry-streaming 📦
Spark Streaming ETL jobs for Mozilla Telemetry
Language: Scala - Size: 691 KB - Last synced at: about 8 hours ago - Pushed at: over 5 years ago - Stars: 18 - Forks: 16

rezacsedu/Mining-Maximal-Frequent-Pattern-Spark
Implementation of Static mining part of "Mining maximal frequent patterns in transactional databases and dynamic data streams: A spark-based approach" Information Sciences, Volume 432, March 2018, Pages 278-300
Language: Java - Size: 37.1 KB - Last synced at: about 2 months ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 1

seilylook/Spark_Definition_Guide_Ch_3
Spark: The Definition Guide - Chapter 3
Language: Jupyter Notebook - Size: 5.86 KB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

zaleslaw/Spark-Tutorial
How to build your first Spark application with MLlib, StructuredStreaming, GraphFrames, Datasets and so on? Answer is here!
Language: Scala - Size: 428 KB - Last synced at: 12 days ago - Pushed at: over 5 years ago - Stars: 53 - Forks: 15

LGDSuiBianDa/Spark
spark总结
Language: Scala - Size: 154 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

IBM/kafka-streaming-click-analysis 📦
Use Kafka and Apache Spark streaming to perform click stream analytics
Language: Jupyter Notebook - Size: 583 KB - Last synced at: 11 days ago - Pushed at: about 5 years ago - Stars: 76 - Forks: 57

qubole/s3-sqs-connector
A library for reading data from Amzon S3 with optimised listing using Amazon SQS using Spark SQL Streaming ( or Structured streaming).
Language: Scala - Size: 41 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 17 - Forks: 12

Rishav273/kafkaPysparkAnalytics
Real-time ETL pipeline for financial data (kafka, pyspark) .
Language: Python - Size: 395 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 1

ckongala/SparkPythonBigData
Big-Data with Apache Spark and Python.
Language: Python - Size: 169 KB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

dev-abuke/Scalable_Backtesting_Infrastructure_for_Crypto_Trading Fork of 10Accademy-InsightStreamInc/Scalable_Backtesting_Infrastructure_for_Crypto_Trading
Our startup, Mela, aims to simplify cryptocurrency trading for everyone and provide reliable investment sources while mitigating risks. We aim to design and build a reliable, large-scale trading data pipeline that can run various backtests and store useful artifacts in a robust data warehouse.
Language: Jupyter Notebook - Size: 2.35 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

awslabs/aws-cloudwatch-metrics-custom-spark-listener
Example Spark streaming sample codes with Custom Listeners to push streaming metrics into Amazon CloudWatch metrics
Language: Scala - Size: 686 KB - Last synced at: 8 days ago - Pushed at: almost 4 years ago - Stars: 8 - Forks: 8

lamastex/spark-trend-calculus-examples
Example applications of spark-trend-calculus
Language: HTML - Size: 33.3 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 5 - Forks: 0

cchandurkar/spark-http-streaming
Running Apache Spark Structured Streaming job on the local machine with an HTTP web server as a streaming source.
Language: Scala - Size: 8.79 KB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 3

sev7e0/wow-spark
:high_brightness: spark自学手册,包含了例如spark core、spark sql、spark streaming、spark-kafka、delta-lake,以及scala基础练习,还有一些例如master、shuffle源码分析,总结及翻译。
Language: Scala - Size: 1.96 MB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 18 - Forks: 7

leletan/maiev
The world of warcraft heroine, a part of the Watchers
Language: Scala - Size: 53.7 KB - Last synced at: about 1 year ago - Pushed at: about 7 years ago - Stars: 2 - Forks: 0

Klarrio/open-stream-processing-benchmark
This repository contains the code base for the Open Stream Processing Benchmark.
Language: Jupyter Notebook - Size: 5.97 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 45 - Forks: 12

amitnema/spark-coach
This project contains the learning and experiments with the Apache Spark.
Language: Scala - Size: 46.9 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

zekeriyyaa/PySpark-Structured-Streaming-ROS-Kafka-ApacheSpark-Cassandra
A structured streaming was applied to the robot data from ROS-Gazebo simulation environment using Apache Spark. Data is collected in Kafka, analyzed by Apache Spark and stored in Cassandra.
Language: Python - Size: 652 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 16 - Forks: 6

DivLoic/mdd-structured-streaming
Sample of code used in the 1st session of the `MoisDelaData`
Language: Jupyter Notebook - Size: 833 KB - Last synced at: over 1 year ago - Pushed at: about 8 years ago - Stars: 1 - Forks: 0

qubole/streaminglens
Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines
Language: Scala - Size: 72.3 KB - Last synced at: about 1 month ago - Pushed at: over 5 years ago - Stars: 17 - Forks: 5

Neuw84/structured-streaming-avro-demo
Spark 3.0.0 Structured Streaming Kafka Avro Demo
Language: Java - Size: 17.6 KB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 15 - Forks: 16

HeartSaVioR/spark-state-tools
Spark Structured Streaming State Tools
Language: Scala - Size: 143 KB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 34 - Forks: 9

sebastianruizm/spark-kafka-cassandra
Demo Spark Structured Streaming + Apache Kafka + Apache Cassandra
Language: Python - Size: 111 KB - Last synced at: 21 days ago - Pushed at: almost 3 years ago - Stars: 5 - Forks: 3

rohitbhintade/spark-streaming-pubsub
Spark Structured Streaming source implementation for Google Pubsub
Language: Scala - Size: 27.3 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

serine000/Structured-Streaming-Pyspark-Project
This project links together a MongoDB cluster and a Kafka cluster with a Standalone Pyspark cluster all done locally
Language: Python - Size: 167 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

viennadatasciencegroup/kf-2017-11-09-R-and-spark
Integrating R into the big data ecosystem using sparklyR
Size: 568 KB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

gz101/kafka-spark-ELK-pipeline
My implementation of the Kappa Architecture using Kafka, Spark, the ELK stack, mostly using Scala, and a bit of Python sprinkled all over.
Language: Scala - Size: 631 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 0

ozzioma/ExperfySparkDataStream
Repo for Experfy's Introduction to Data Streaming Applications with Apache Spark Structured Streaming
Size: 4.88 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

AndrewKuzmin/spark-structured-streaming-examples
Spark structured streaming examples with using of version 3.4.0
Language: Scala - Size: 1.06 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 25 - Forks: 14

sLeeNguyen/spark-logs-analysis
This is an academic project which aim to create a data streaming pipeline using Spark Structured Streaming, Elasticsearch and Kibana.
Language: Jupyter Notebook - Size: 2.16 MB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

gabrissk/data_viz_sentiment_analysis
BI solution to get reddit posts and comments and do sentiment analysis.
Language: Jupyter Notebook - Size: 10.8 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

amine-akrout/Spark_Stuctured_Streaming
Language: Python - Size: 926 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 3

predictiveworks/works-sqlstream
This project complements Apache Spark structured streaming with hand-picked streaming sources and sinks.
Language: Scala - Size: 4.69 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 0

radoslawkrolikowski/financial-market-data-analysis
Real-Time Financial Market Data Processing and Prediction application
Language: Python - Size: 31 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 55 - Forks: 27

chenyyyang/spark-sql-custom-mq-dataSource
基于Spark 3.1.x 数据源API实现的MQ数据源示例代码
Language: Java - Size: 28.3 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 5 - Forks: 0

data-han/twitter_kafka_sentiment
Ingesting real-time Twitter API using tweepy into Kafka and process using Apache Spark Structured Streaming with Sentiment Analysis TextBlob before loading into time-series database, InfluxDB and monitoring dashboard, Grafana
Language: Python - Size: 1.89 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 2

knoldus/structured-streaming-application
Structured Streaming is a reference application showing how to easily integrate structured streaming Apache Spark Structured Streaming, Apache Cassandra and Apache Kafka for fast, structured streaming computations on data.
Language: Scala - Size: 35.2 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 13 - Forks: 9

ofili/pyspark-template
Structured Streaming app that can read files from the local system folder as new files are added to the folder as stream data and apply all the operations on the new data and, finally, write the results in an output directory.
Language: Python - Size: 42 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

bartosz25/data-ai-summit-2020
You will find here the demo codes for my Data+AI 2020 talk about customizing Apache Spark state store.
Size: 194 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 2

jacobceles/ChicagoTaxiTrips-SparkStreaming-RealTimeDashboard
Analyzing Chicago taxi trips dataset using Spark Streaming, and a real-time dashboard for reporting using Flask.
Language: CSS - Size: 15.4 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 1

anuj1207/structured-streaming-examples
Examples of Apache Spark Structured Streaming
Language: Scala - Size: 5.86 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 3

xiaogp/recsys_structured_streaming
kafka + structured streaming + phoenix + elasticsearch 基于行为日志实现热门推荐,用户偏好推荐,召回融合策略实现。
Language: Scala - Size: 108 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 15 - Forks: 11

cfmcgrady/spark-rest-source
A Rest Api Structured Streaming DataSource
Language: Scala - Size: 16.6 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 1

epishova/Structured-Streaming-Cassandra-Sink
An example of how to create and use Cassandra sink in Spark Structured Streaming application
Language: Scala - Size: 52.7 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 8 - Forks: 2

TrainingByPackt/Big-Data-Processing-with-Apache-Spark-eLearning
Efficiently tackle large datasets and perform big data analysis with Spark and Python
Language: Python - Size: 36.1 KB - Last synced at: about 1 month ago - Pushed at: over 6 years ago - Stars: 7 - Forks: 6

WinterSoldier13/linkinJMS
A structured streaming source for Spark to read data from a streaming source. Currently supports only ActiveMQ
Language: Scala - Size: 31.3 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

mpfishe2/eventhubs-databricks-quickstart
Get up and running quickly with Spark Structured Streaming on Azure Databricks using Azure Event Hubs
Language: Scala - Size: 3.91 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

sankamuk/aws-kinesis-redshift-sparkstream
Spark Structured Streaming from AWS Kinesis and Redshift
Language: Shell - Size: 29.3 KB - Last synced at: 3 months ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 1

pavanpkulkarni/CreditCard_Fraud_Detection
Spark MLLib Application for Credit Card Fraud Detection - Structured Streaming
Language: Scala - Size: 47.9 KB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 1

thestyleofme/spark-explore
spark生态学习
Language: Scala - Size: 69.3 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 6 - Forks: 2

shaunmcglinchey/tweetpipe-spark-analyser
This repository contains an alternative Apache Spark-based analysis tier for the TweetPipe streaming data pipeline
Language: Java - Size: 5.86 KB - Last synced at: 4 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

pavanpkulkarni/Spark_Streaming_Examples
This repo contains spark structured streaming examples in Scala
Language: Scala - Size: 10.7 KB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 2

mahesh2492/learning-spark
Repository contains various examples of Spark ApI i.e RDD, DataFrame, Structured Streaming etc
Language: Scala - Size: 629 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 4

bdoepf/spark-cassandra-sink
spark-cassandra-sink is a Spark Structured Streaming Sink for cassandra
Language: Scala - Size: 50.8 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 4 - Forks: 1

bones-brigade/kafka-spark-openshift-moments
Size: 21.5 KB - Last synced at: 5 months ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 1

AdamGS/spark-confluent-example
A demo project using spark streaming and the confluent platform
Language: Scala - Size: 11.7 KB - Last synced at: 4 days ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

Neuw84/spark-continuous-streaming
Spark 2.3. End to End Avro Continous Structured Streaming Kafka demo using Twitter´s bijection in Java.
Language: Java - Size: 60.5 KB - Last synced at: about 2 months ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 6

ziedYazidi/Spark-Twitter-Tutorial
Spark Sentiment analysis realized on the Twitter Stream
Language: XSLT - Size: 5.9 MB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

KevinMellott91/spark-summit-2019-demo
Demo created for "Life is but a Stream" presentation at Spark AI Summit 2019 (San Francisco, CA)
Language: HTML - Size: 464 KB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 2

AndrewKuzmin/spark-ml-pipelines-with-structured-streaming-examples
Examples of using Apache Spark MLlib Pipelines and Structured Streaming on version 2.4.0
Language: Shell - Size: 1020 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

Pavithgokul/SparkStructuredStreamingWithKafka
Spark Sample with Kafka
Language: XSLT - Size: 1.79 MB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

rohangulati/cassandra-sink
Spark structured streaming sink for cassandra
Language: Scala - Size: 15.6 KB - Last synced at: 5 months ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 1

Neuw84/bds2k17
Repository containing code for the Big Data Spain 2017 technical talk "Towards an Unified API for Spark and the IIoT" Edit
Language: Java - Size: 3.38 MB - Last synced at: about 2 months ago - Pushed at: over 7 years ago - Stars: 2 - Forks: 0
