GitHub topics: spark-structured-streaming
AlexRogalskiy/spark-patterns
🏆 Spark4You Design patterns
Language: Shell - Size: 19.8 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 3 - Forks: 0

Pedro-Manoel/iot-analytics-solution-tcc
🎓 Repositório com a solução de IoT Analytics desenvolvida como parte do Trabalho de Conclusão de Curso (TCC) do curso de Ciência da Computação da Universidade Federal de Campina Grande (UFCG)
Language: TypeScript - Size: 177 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 4 - Forks: 0

jaceklaskowski/spark-workshop
Apache Spark™ and Scala Workshops
Language: HTML - Size: 57 MB - Last synced at: 18 days ago - Pushed at: 9 months ago - Stars: 264 - Forks: 148

guidok91/spark-structured-streaming-kafka
Spark Structured Streaming data pipeline that processes movie ratings data in real-time.
Language: Python - Size: 178 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 13 - Forks: 4

imjuliengaupin/sparkler
Language: Java - Size: 33.2 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

Mileristovski/DataEngineer-SparkStreaming
Track a Boat est un système de suivi maritime en temps réel utilisant Kafka, Spark Structured Streaming et WebSockets. Il permet de visualiser la position des navires, analyser leurs trajectoires et prévoir leurs destinations sur une carte interactive.
Language: Jupyter Notebook - Size: 9.59 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

chermenin/spark-states
Custom state store providers for Apache Spark
Language: Scala - Size: 267 KB - Last synced at: 18 days ago - Pushed at: 2 months ago - Stars: 92 - Forks: 26

tigureis/modelo_de_pipeline_com_spark_no_databricks
Este projeto simula um pipeline de streaming de dados utilizando Spark Structured Streaming e Delta Lake em um ambiente Databricks. O objetivo é demonstrar como processar dados em tempo real, mesmo quando a fonte de dados é estática e fornecida por meio de comandos SQL.
Language: Jupyter Notebook - Size: 10.7 KB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

dohabanoui/Spark-Structured-Streaming
Real-time analysis of hospital incident data using Apache Spark Streaming to track incidents by service and identify the top years with the most incidents.
Language: Java - Size: 857 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

qubole/kinesis-sql
Kinesis Connector for Structured Streaming
Language: Scala - Size: 251 KB - Last synced at: 15 days ago - Pushed at: 10 months ago - Stars: 136 - Forks: 80

tomaztk/Azure-Databricks
Azure Databricks - Advent of 2020 Blogposts
Language: Jupyter Notebook - Size: 44.9 MB - Last synced at: 20 days ago - Pushed at: over 2 years ago - Stars: 60 - Forks: 49

streamnative/awesome-pulsar
A curated list of Pulsar tools, integrations and resources.
Size: 11.7 KB - Last synced at: 11 days ago - Pushed at: over 4 years ago - Stars: 80 - Forks: 9

lupusruber/music_analytics
This project processes real-time music event data using Kafka, Apache Spark on Google Cloud Dataproc, and stores the transformed data in BigQuery for analytics, all orchestrated by Airflow and managed with Terraform.
Language: Jupyter Notebook - Size: 22.1 MB - Last synced at: 26 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

SimpleSoulll/ss-aof
spark structured streaming appending only file source based on datasource apiv2. Spark增量日志流式抓取
Language: Scala - Size: 33.2 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

pprzetacznik/datalake
Simple datalake
Language: Python - Size: 40 KB - Last synced at: 26 days ago - Pushed at: 10 months ago - Stars: 2 - Forks: 0

thedevd/techBlog
Examples of IT ruling technologies
Language: Scala - Size: 29.3 MB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 1

AbsaOSS/hyperdrive
Extensible streaming ingestion pipeline on top of Apache Spark
Language: Scala - Size: 1.62 MB - Last synced at: 14 days ago - Pushed at: about 1 year ago - Stars: 44 - Forks: 13

SayamAlt/PySpark-for-Big-Data-and-Machine-Learning
This is the material for Jose Portilla's Spark and Python for Big Data and ML course.
Language: Jupyter Notebook - Size: 3.79 MB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

ohmycloud/sub_trip_with_structured_spark_streaming
使用 Structured Spark Streaming 进行行程划分
Language: Scala - Size: 67.4 KB - Last synced at: 4 days ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

AmadeusITGroup/Elastic-Scaling
Elastic scaling is a library that allows to control the number of resources (executors or workers) instantiated by a Spark Structured Streaming Job in order to optimize the effective microbatch duration.
Language: Scala - Size: 32.2 KB - Last synced at: 19 days ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 1

roksolana-d/spark-streaming-examples
Research on legacy and structured streaming with Spark
Language: Scala - Size: 22.5 KB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 2 - Forks: 2

ramottamado/isabel
Spark Structured Streaming with Kafka Integration
Language: Python - Size: 14.6 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

aTechGuide/click-stream-analysis
Spark Structured Streaming App to aggregate data on rolling window of events (Not necessarily time)
Language: Scala - Size: 8.79 KB - Last synced at: 2 months ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

hadiezatpanah/Spark_Structured_Streaming_Java
In this solution, the issue of creating a table with case-sensitive columns (in the scenario where the table doesn't exist or when writing the table in overwrite mode) in Oracle has been addressed by developing a custom Oracle dialect and registering it.
Language: Java - Size: 385 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

tienlepham094/TwitterSparkStreaming
Twitter Streaming Project
Language: Python - Size: 202 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

martinKindall/NYC-Taxi-Limousine-Data-Spark
NYC Taxi & Limousine Commission's open data with Spark Streaming 3.0.0
Language: Scala - Size: 43.9 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 1

RosarioB/spark-streaming-kafka
Exploring Spark Structured Streaming features by making use of Jupiter notebooks, Pyspark and interacting with a Kafka cluster.
Size: 130 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ozancicek/artan
Online latent state estimation with Spark
Language: Scala - Size: 553 KB - Last synced at: 5 months ago - Pushed at: 6 months ago - Stars: 5 - Forks: 2

liupeirong/spark-structured-streaming-ci-cd
Spark structured streaming with unit tests integrated with Travis CI
Language: Scala - Size: 5.86 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 3

nama1arpit/reddit-streaming-pipeline
A real-time reddit data streaming pipeline for sentiment analysis of various subreddits
Language: HCL - Size: 15.1 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 51 - Forks: 2

jwsmai/ScalaTools
This project provides Apache Spark SQL, Flink DataStream API examples in Scala language
Language: Scala - Size: 3.19 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

fhuertas/uah-mbi-2019-streaming
Repositorio para la clase de UAM, Máster en Business Intelligence, PARALELIZACIÓN DE DATOS, Modulo de Streaming
Language: Jupyter Notebook - Size: 9.04 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 1

kklimexk/zio-playground
Playground for ZIO library
Language: Scala - Size: 16.6 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

NashTech-Labs/Sparkathon
A library having Java and Scala examples for Spark 2.x
Language: Java - Size: 113 MB - Last synced at: 18 days ago - Pushed at: over 8 years ago - Stars: 7 - Forks: 9

firecast/dhs-2019-demo
DataHack Summit 2019 demo files
Size: 33.7 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 1

Mark1002/sf-crime-statistics-spark-streaming
my udacity project
Language: Jupyter Notebook - Size: 1.69 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

garystafford/streaming-sales-generator
Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Python
Language: Python - Size: 9.28 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 28 - Forks: 11

hadiezatpanah/Spark_Java_Stateful
This project presents a distributable solution based on Spark Java, aiming to connect start and end session events together in a stateful manner. The project utilizes `flatMapGroupWithState`functionality which is a powerful feature for stateful stream processing in Spark. It enables you to maintain and update the state across batches.
Language: Java - Size: 95.7 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

hadiezatpanah/Spark_Java_MostValuableCustomers
This Spark Java project serves as a demonstration of Gradle Spark configuration, specifically focusing on utilizing the MemoryStream class as the streaming source.
Language: Java - Size: 65.4 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

AndrewKuzmin/spark-structured-streaming-examples
Spark structured streaming examples with using of version 3.4.0
Language: Scala - Size: 1.06 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 25 - Forks: 14

stephen29xie/tweet-streaming-data-pipeline
Real-time streaming data pipeline for Twitter Tweets
Language: Scala - Size: 301 KB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 11 - Forks: 9

Mathews-Tom/MSc-in-Machine-Learning-and-Artificial-Intelligence
Master of Science in Machine Learning & Artificial Intelligence - Indian Institute Technology Madras & Liverpool John Moores University
Language: Jupyter Notebook - Size: 2.12 GB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 6 - Forks: 7

xkondix/MsgBrokerSys
Spark Structured Streaming vs Kafka Streams
Language: Python - Size: 55.4 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

rajat2004/twitter-kafka
Twitter Web-App using Apache Kafka, Spark & perform analysis
Language: Python - Size: 29.3 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

Uriah372-DS/DDBMSPysparkProject
A course project with implementation of machine learning with spark structured streaming in python
Language: Jupyter Notebook - Size: 20.8 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

anmollp/Zootopia
A distributed streaming data processing pipeline.
Language: Python - Size: 1.15 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

b-b3rn4rd/terraform-provider-emrstreaming
The emrstreaming provider offers continuous deployment functionality for streaming steps into an EMR cluster.
Language: Go - Size: 116 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

vvittis/CCFD-RF
Credit Card Fraudulent Detection with Random Forest
Language: Java - Size: 4.49 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

vodkolav/DataEngineerProject
This is my final project for Data Engineer Expert course at Naya College.
Language: Jupyter Notebook - Size: 930 KB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

CloudComputingProject-2022/Data_visualization_and_analysis_tool_for_telemetry_data
An naive anomaly detection and data visualization tool for F1 on board telemetry data.
Language: Python - Size: 1.4 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 8 - Forks: 1

rajeshsantha/MonitoredStructuredStreaming
Repository for Spark structured streaming use case implementations.
Language: Scala - Size: 65.4 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 1

hoseinlook/cpu-anomaly-detection-with-spark
cpu anomaly detection with spark
Language: Python - Size: 333 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

falaybeg/SparkStreaming-Network-Anomaly-Detection
This repository includes supervised and unsupervised machine learning methods which are used to detect anomalies on network datasets. Decision Tree, Random Forest, Gradient Boost Tree, Naive Bayes, and Logistic Regression were used for supervised learning. K-Means was used for unsupervised learning.
Language: Jupyter Notebook - Size: 2.98 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 11 - Forks: 3

hadiezatpanah/Trending_Topic_Spark_Streaming_Scala
This is an End to End solution to read data from streaming source (kafka), extract different topic from data in each time window, calculating Hot Topics using a modified Z-Score Algorithm and storing Final Trend Topics in Postgres SQL Database
Language: Scala - Size: 6.47 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

jacobceles/ChicagoTaxiTrips-SparkStreaming-RealTimeDashboard
Analyzing Chicago taxi trips dataset using Spark Streaming, and a real-time dashboard for reporting using Flask.
Language: CSS - Size: 15.4 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 1

iomete/sql-streaming-sqs
Fork of the Apache Bahir sql-streaming-sqs, compatible with Spark 3
Language: Scala - Size: 25.4 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

bluejoe2008/spark-http-stream
spark structured streaming via HTTP communication
Language: Scala - Size: 207 KB - Last synced at: 24 days ago - Pushed at: almost 3 years ago - Stars: 18 - Forks: 10

dharaneeshvrd/spark-examples
Spark Examples
Language: Python - Size: 35.2 KB - Last synced at: 12 days ago - Pushed at: almost 4 years ago - Stars: 2 - Forks: 5

ArmanShakeri/Pyspark-upsert-oracle
Pyspark sample for upsert data to oracle table
Language: Python - Size: 23.4 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

renardeinside/spark-streaming-state-store-example
Spark Structured Streaming with State Store
Language: Scala - Size: 26.4 KB - Last synced at: about 11 hours ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 3

iomete/kafka-streaming-job
Kafka streaming job from iomete. This streaming job copies data from Kafka to Iceberg.
Language: Python - Size: 383 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

hosnaa/Apache-Spark-Streaming-Analysis
Analysis for a streaming daily retail data using Spark structured streaming and querying this data to get insights
Language: HTML - Size: 57.6 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

Thelin90/deiteo
P.O.C Spark On Kubernetes
Language: Shell - Size: 1.21 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 2

mpfishe2/eventhubs-databricks-quickstart
Get up and running quickly with Spark Structured Streaming on Azure Databricks using Azure Event Hubs
Language: Scala - Size: 3.91 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

PierreVerbe/Scala-Spark-Template
🛠️ Template to do data processing with Scala and Apache Spark ✨
Language: Scala - Size: 137 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

michelheil/BigData
Projects related to Big Data technologies
Language: Java - Size: 2.24 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

AndrewKuzmin/Analytics-For-IoT-Devices-Using-Spark
Analytics for IoT devices using Apache Spark Structured Streaming 2.4.0
Language: Scala - Size: 1.03 MB - Last synced at: almost 2 years ago - Pushed at: about 6 years ago - Stars: 5 - Forks: 1

RooseveltAdvisors/spark_structured_streaming_demo
A Log Analytics demo based on Spark Structured Streaming + Kafka
Language: Python - Size: 1.41 MB - Last synced at: 4 months ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 3

PLarboulette/spark-structured-streaming
Language: Scala - Size: 22.5 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

sunujh6/spark_practice
Language: Jupyter Notebook - Size: 1.62 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

JulienPeloton/mini_spark_broker
Design and proof-of-concept for a Broker for astronomy using Apache Spark
Language: Jupyter Notebook - Size: 8.98 MB - Last synced at: 14 days ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 2

LuckyZXL2016/Spark-Example
Spark1.6和spark2.2的示例,包含kafka,flume,structuredstreaming,jedis,elasticsearch,mysql,dataframe
Language: Scala - Size: 2.06 MB - Last synced at: 19 days ago - Pushed at: about 7 years ago - Stars: 15 - Forks: 6

aTechGuide/spark-streaming
Spark Streaming Scripts and integrations with other technologies
Language: TSQL - Size: 32.4 MB - Last synced at: 2 months ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

aqib1/java-spark-structured-streaming
Language: Java - Size: 3.91 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

aqib1/spark-structured-streaming-java
Language: Java - Size: 21.5 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

greysap/microbatch2cassandra
Language: Java - Size: 8.79 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 1

conker84/kafka-rome-june-2k19
Size: 35 MB - Last synced at: 18 days ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

haozhang-x/log-analysis-spark
Structured Streaming Log Analysis
Language: Scala - Size: 72.3 KB - Last synced at: almost 2 years ago - Pushed at: almost 6 years ago - Stars: 2 - Forks: 2

AndrewKuzmin/spark-ml-pipelines-with-structured-streaming-examples
Examples of using Apache Spark MLlib Pipelines and Structured Streaming on version 2.4.0
Language: Shell - Size: 1020 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

ramkashyap-s/Live-Dash
Stream processing pipeline for analyzing live chat data
Language: Python - Size: 5 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

kthristov/cubos-olap
Language: Scala - Size: 24.4 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

sergei-grigorev/spark-streaming-project
In-Stream final project
Language: Scala - Size: 107 KB - Last synced at: almost 2 years ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 1

SevakAvet/gridu-spark-streaming
Study project, apache kafka + apache spark
Language: Scala - Size: 19.5 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 1

tspannhw/nifi-spark-structuredstreaming
Language: Scala - Size: 5.86 KB - Last synced at: about 1 year ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 0
