An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: structured-streaming

streamnative/pulsar-spark

Spark Connector to read and write with Pulsar

Language: Scala - Size: 712 KB - Last synced at: 1 day ago - Pushed at: 4 days ago - Stars: 113 - Forks: 50

databricks/LearningSparkV2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

Language: Scala - Size: 75.2 MB - Last synced at: 2 days ago - Pushed at: 4 months ago - Stars: 1,294 - Forks: 770

lw-lin/CoolplaySpark

酷玩 Spark: Spark 源代码解析、Spark 类库等

Language: Scala - Size: 9.54 MB - Last synced at: 3 days ago - Pushed at: almost 3 years ago - Stars: 3,485 - Forks: 1,410

igopalakrishna/nyc-subway-foot-traffic-prediction-and-forecasting

Designed and implemented a scalable real-time analytics pipeline using Apache Kafka, Spark Structured Streaming, and MongoDB to simulate NYC MTA turnstile data and forecast real-time subway foot traffic using SparkML Random Forest models.

Language: Jupyter Notebook - Size: 1.27 MB - Last synced at: 3 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 1

astrolabsoftware/fink-broker

Astronomy Broker based on Apache Spark

Language: Python - Size: 98.5 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 70 - Forks: 14

astrolabsoftware/fink

Fink documentation website

Size: 41.9 MB - Last synced at: 18 days ago - Pushed at: 26 days ago - Stars: 3 - Forks: 2

AndGeo69/StreamingCotiles

A streaming implementation of COTILES algorithm using Apache Spark's Structured Streaming API

Language: Python - Size: 2.55 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

aws-samples/iceberg-streaming-examples

This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenarios using best practices. The code can be deployed into any Spark compatible engine like Amazon EMR Serverless or AWS Glue. A fully local developer environment is also provided.

Language: Java - Size: 443 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 23 - Forks: 4

japila-books/spark-structured-streaming-internals

The Internals of Spark Structured Streaming

Size: 119 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 417 - Forks: 172

chermenin/spark-states

Custom state store providers for Apache Spark

Language: Scala - Size: 267 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 92 - Forks: 26

sankamuk/PysparkCheatsheet

PySpark Cheatsheet

Language: Python - Size: 11.2 MB - Last synced at: 12 days ago - Pushed at: over 2 years ago - Stars: 36 - Forks: 27

polomarcus/Spark-Structured-Streaming-Examples

Spark Structured Streaming / Kafka / Cassandra / Elastic

Language: Scala - Size: 16.5 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 183 - Forks: 78

Azure/azure-event-hubs-spark

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs

Language: Scala - Size: 19.6 MB - Last synced at: about 23 hours ago - Pushed at: 3 months ago - Stars: 235 - Forks: 177

qubole/kinesis-sql

Kinesis Connector for Structured Streaming

Language: Scala - Size: 251 KB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 136 - Forks: 80

abdheshkumar/spark-practices

Spark, Spark Streaming and Kafka Streaming examples

Language: Scala - Size: 61.9 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 5 - Forks: 0

yjshen/spark-connector-test

A tutorial on how to use pulsar-spark-connector

Language: Scala - Size: 12.7 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 11 - Forks: 3

aamend/spark-gdelt

Binding the GDELT universe in a Spark environment

Language: Scala - Size: 5.61 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 23 - Forks: 10

LeoneGarage/StreamJoin

A framework for incremental streaming joins and incremental streaming aggregations over change data feeds from Databricks Delta

Language: Python - Size: 196 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 3 - Forks: 2

qubole/spark-state-store

Rocksdb state storage implementation for Structured Streaming.

Language: Scala - Size: 56.6 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 17 - Forks: 8

HeartSaVioR/spark-sql-kafka-offset-committer

Kafka offset committer for structured streaming query

Language: Scala - Size: 89.8 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 37 - Forks: 15

mozilla/telemetry-streaming 📦

Spark Streaming ETL jobs for Mozilla Telemetry

Language: Scala - Size: 691 KB - Last synced at: about 8 hours ago - Pushed at: over 5 years ago - Stars: 18 - Forks: 16

rezacsedu/Mining-Maximal-Frequent-Pattern-Spark

Implementation of Static mining part of "Mining maximal frequent patterns in transactional databases and dynamic data streams: A spark-based approach" Information Sciences, Volume 432, March 2018, Pages 278-300

Language: Java - Size: 37.1 KB - Last synced at: about 2 months ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 1

seilylook/Spark_Definition_Guide_Ch_3

Spark: The Definition Guide - Chapter 3

Language: Jupyter Notebook - Size: 5.86 KB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

zaleslaw/Spark-Tutorial

How to build your first Spark application with MLlib, StructuredStreaming, GraphFrames, Datasets and so on? Answer is here!

Language: Scala - Size: 428 KB - Last synced at: 12 days ago - Pushed at: over 5 years ago - Stars: 53 - Forks: 15

LGDSuiBianDa/Spark

spark总结

Language: Scala - Size: 154 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

IBM/kafka-streaming-click-analysis 📦

Use Kafka and Apache Spark streaming to perform click stream analytics

Language: Jupyter Notebook - Size: 583 KB - Last synced at: 11 days ago - Pushed at: about 5 years ago - Stars: 76 - Forks: 57

qubole/s3-sqs-connector

A library for reading data from Amzon S3 with optimised listing using Amazon SQS using Spark SQL Streaming ( or Structured streaming).

Language: Scala - Size: 41 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 17 - Forks: 12

Rishav273/kafkaPysparkAnalytics

Real-time ETL pipeline for financial data (kafka, pyspark) .

Language: Python - Size: 395 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 1

ckongala/SparkPythonBigData

Big-Data with Apache Spark and Python.

Language: Python - Size: 169 KB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

dev-abuke/Scalable_Backtesting_Infrastructure_for_Crypto_Trading Fork of 10Accademy-InsightStreamInc/Scalable_Backtesting_Infrastructure_for_Crypto_Trading

Our startup, Mela, aims to simplify cryptocurrency trading for everyone and provide reliable investment sources while mitigating risks. We aim to design and build a reliable, large-scale trading data pipeline that can run various backtests and store useful artifacts in a robust data warehouse.

Language: Jupyter Notebook - Size: 2.35 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

awslabs/aws-cloudwatch-metrics-custom-spark-listener

Example Spark streaming sample codes with Custom Listeners to push streaming metrics into Amazon CloudWatch metrics

Language: Scala - Size: 686 KB - Last synced at: 8 days ago - Pushed at: almost 4 years ago - Stars: 8 - Forks: 8

lamastex/spark-trend-calculus-examples

Example applications of spark-trend-calculus

Language: HTML - Size: 33.3 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 5 - Forks: 0

cchandurkar/spark-http-streaming

Running Apache Spark Structured Streaming job on the local machine with an HTTP web server as a streaming source.

Language: Scala - Size: 8.79 KB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 3

sev7e0/wow-spark

:high_brightness: spark自学手册,包含了例如spark core、spark sql、spark streaming、spark-kafka、delta-lake,以及scala基础练习,还有一些例如master、shuffle源码分析,总结及翻译。

Language: Scala - Size: 1.96 MB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 18 - Forks: 7

leletan/maiev

The world of warcraft heroine, a part of the Watchers

Language: Scala - Size: 53.7 KB - Last synced at: about 1 year ago - Pushed at: about 7 years ago - Stars: 2 - Forks: 0

Klarrio/open-stream-processing-benchmark

This repository contains the code base for the Open Stream Processing Benchmark.

Language: Jupyter Notebook - Size: 5.97 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 45 - Forks: 12

amitnema/spark-coach

This project contains the learning and experiments with the Apache Spark.

Language: Scala - Size: 46.9 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

zekeriyyaa/PySpark-Structured-Streaming-ROS-Kafka-ApacheSpark-Cassandra

A structured streaming was applied to the robot data from ROS-Gazebo simulation environment using Apache Spark. Data is collected in Kafka, analyzed by Apache Spark and stored in Cassandra.

Language: Python - Size: 652 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 16 - Forks: 6

DivLoic/mdd-structured-streaming

Sample of code used in the 1st session of the `MoisDelaData`

Language: Jupyter Notebook - Size: 833 KB - Last synced at: over 1 year ago - Pushed at: about 8 years ago - Stars: 1 - Forks: 0

qubole/streaminglens

Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines

Language: Scala - Size: 72.3 KB - Last synced at: about 1 month ago - Pushed at: over 5 years ago - Stars: 17 - Forks: 5

Neuw84/structured-streaming-avro-demo

Spark 3.0.0 Structured Streaming Kafka Avro Demo

Language: Java - Size: 17.6 KB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 15 - Forks: 16

HeartSaVioR/spark-state-tools

Spark Structured Streaming State Tools

Language: Scala - Size: 143 KB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 34 - Forks: 9

sebastianruizm/spark-kafka-cassandra

Demo Spark Structured Streaming + Apache Kafka + Apache Cassandra

Language: Python - Size: 111 KB - Last synced at: 21 days ago - Pushed at: almost 3 years ago - Stars: 5 - Forks: 3

rohitbhintade/spark-streaming-pubsub

Spark Structured Streaming source implementation for Google Pubsub

Language: Scala - Size: 27.3 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

serine000/Structured-Streaming-Pyspark-Project

This project links together a MongoDB cluster and a Kafka cluster with a Standalone Pyspark cluster all done locally

Language: Python - Size: 167 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

viennadatasciencegroup/kf-2017-11-09-R-and-spark

Integrating R into the big data ecosystem using sparklyR

Size: 568 KB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

gz101/kafka-spark-ELK-pipeline

My implementation of the Kappa Architecture using Kafka, Spark, the ELK stack, mostly using Scala, and a bit of Python sprinkled all over.

Language: Scala - Size: 631 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 0

ozzioma/ExperfySparkDataStream

Repo for Experfy's Introduction to Data Streaming Applications with Apache Spark Structured Streaming

Size: 4.88 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

AndrewKuzmin/spark-structured-streaming-examples

Spark structured streaming examples with using of version 3.4.0

Language: Scala - Size: 1.06 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 25 - Forks: 14

sLeeNguyen/spark-logs-analysis

This is an academic project which aim to create a data streaming pipeline using Spark Structured Streaming, Elasticsearch and Kibana.

Language: Jupyter Notebook - Size: 2.16 MB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

gabrissk/data_viz_sentiment_analysis

BI solution to get reddit posts and comments and do sentiment analysis.

Language: Jupyter Notebook - Size: 10.8 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

amine-akrout/Spark_Stuctured_Streaming

Language: Python - Size: 926 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 3

predictiveworks/works-sqlstream

This project complements Apache Spark structured streaming with hand-picked streaming sources and sinks.

Language: Scala - Size: 4.69 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 0

radoslawkrolikowski/financial-market-data-analysis

Real-Time Financial Market Data Processing and Prediction application

Language: Python - Size: 31 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 55 - Forks: 27

chenyyyang/spark-sql-custom-mq-dataSource

基于Spark 3.1.x 数据源API实现的MQ数据源示例代码

Language: Java - Size: 28.3 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 5 - Forks: 0

data-han/twitter_kafka_sentiment

Ingesting real-time Twitter API using tweepy into Kafka and process using Apache Spark Structured Streaming with Sentiment Analysis TextBlob before loading into time-series database, InfluxDB and monitoring dashboard, Grafana

Language: Python - Size: 1.89 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 2

knoldus/structured-streaming-application

Structured Streaming is a reference application showing how to easily integrate structured streaming Apache Spark Structured Streaming, Apache Cassandra and Apache Kafka for fast, structured streaming computations on data.

Language: Scala - Size: 35.2 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 13 - Forks: 9

ofili/pyspark-template

Structured Streaming app that can read files from the local system folder as new files are added to the folder as stream data and apply all the operations on the new data and, finally, write the results in an output directory.

Language: Python - Size: 42 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

bartosz25/data-ai-summit-2020

You will find here the demo codes for my Data+AI 2020 talk about customizing Apache Spark state store.

Size: 194 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 2

jacobceles/ChicagoTaxiTrips-SparkStreaming-RealTimeDashboard

Analyzing Chicago taxi trips dataset using Spark Streaming, and a real-time dashboard for reporting using Flask.

Language: CSS - Size: 15.4 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 1

anuj1207/structured-streaming-examples

Examples of Apache Spark Structured Streaming

Language: Scala - Size: 5.86 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 3

xiaogp/recsys_structured_streaming

kafka + structured streaming + phoenix + elasticsearch 基于行为日志实现热门推荐,用户偏好推荐,召回融合策略实现。

Language: Scala - Size: 108 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 15 - Forks: 11

cfmcgrady/spark-rest-source

A Rest Api Structured Streaming DataSource

Language: Scala - Size: 16.6 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 1

epishova/Structured-Streaming-Cassandra-Sink

An example of how to create and use Cassandra sink in Spark Structured Streaming application

Language: Scala - Size: 52.7 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 8 - Forks: 2

TrainingByPackt/Big-Data-Processing-with-Apache-Spark-eLearning

Efficiently tackle large datasets and perform big data analysis with Spark and Python

Language: Python - Size: 36.1 KB - Last synced at: about 1 month ago - Pushed at: over 6 years ago - Stars: 7 - Forks: 6

WinterSoldier13/linkinJMS

A structured streaming source for Spark to read data from a streaming source. Currently supports only ActiveMQ

Language: Scala - Size: 31.3 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

mpfishe2/eventhubs-databricks-quickstart

Get up and running quickly with Spark Structured Streaming on Azure Databricks using Azure Event Hubs

Language: Scala - Size: 3.91 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

sankamuk/aws-kinesis-redshift-sparkstream

Spark Structured Streaming from AWS Kinesis and Redshift

Language: Shell - Size: 29.3 KB - Last synced at: 3 months ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 1

pavanpkulkarni/CreditCard_Fraud_Detection

Spark MLLib Application for Credit Card Fraud Detection - Structured Streaming

Language: Scala - Size: 47.9 KB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 1

thestyleofme/spark-explore

spark生态学习

Language: Scala - Size: 69.3 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 6 - Forks: 2

shaunmcglinchey/tweetpipe-spark-analyser

This repository contains an alternative Apache Spark-based analysis tier for the TweetPipe streaming data pipeline

Language: Java - Size: 5.86 KB - Last synced at: 4 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

pavanpkulkarni/Spark_Streaming_Examples

This repo contains spark structured streaming examples in Scala

Language: Scala - Size: 10.7 KB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 2

mahesh2492/learning-spark

Repository contains various examples of Spark ApI i.e RDD, DataFrame, Structured Streaming etc

Language: Scala - Size: 629 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 4

bdoepf/spark-cassandra-sink

spark-cassandra-sink is a Spark Structured Streaming Sink for cassandra

Language: Scala - Size: 50.8 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 4 - Forks: 1

bones-brigade/kafka-spark-openshift-moments

Size: 21.5 KB - Last synced at: 5 months ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 1

AdamGS/spark-confluent-example

A demo project using spark streaming and the confluent platform

Language: Scala - Size: 11.7 KB - Last synced at: 4 days ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

Neuw84/spark-continuous-streaming

Spark 2.3. End to End Avro Continous Structured Streaming Kafka demo using Twitter´s bijection in Java.

Language: Java - Size: 60.5 KB - Last synced at: about 2 months ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 6

ziedYazidi/Spark-Twitter-Tutorial

Spark Sentiment analysis realized on the Twitter Stream

Language: XSLT - Size: 5.9 MB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

KevinMellott91/spark-summit-2019-demo

Demo created for "Life is but a Stream" presentation at Spark AI Summit 2019 (San Francisco, CA)

Language: HTML - Size: 464 KB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 2

AndrewKuzmin/spark-ml-pipelines-with-structured-streaming-examples

Examples of using Apache Spark MLlib Pipelines and Structured Streaming on version 2.4.0

Language: Shell - Size: 1020 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

Pavithgokul/SparkStructuredStreamingWithKafka

Spark Sample with Kafka

Language: XSLT - Size: 1.79 MB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

rohangulati/cassandra-sink

Spark structured streaming sink for cassandra

Language: Scala - Size: 15.6 KB - Last synced at: 5 months ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 1

Neuw84/bds2k17

Repository containing code for the Big Data Spain 2017 technical talk "Towards an Unified API for Spark and the IIoT" Edit

Language: Java - Size: 3.38 MB - Last synced at: about 2 months ago - Pushed at: over 7 years ago - Stars: 2 - Forks: 0