An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: spark-structured-streaming

AlexRogalskiy/spark-patterns

🏆 Spark4You Design patterns

Language: Shell - Size: 19.8 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 3 - Forks: 0

Pedro-Manoel/iot-analytics-solution-tcc

🎓 Repositório com a solução de IoT Analytics desenvolvida como parte do Trabalho de Conclusão de Curso (TCC) do curso de Ciência da Computação da Universidade Federal de Campina Grande (UFCG)

Language: TypeScript - Size: 177 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 4 - Forks: 0

jaceklaskowski/spark-workshop

Apache Spark™ and Scala Workshops

Language: HTML - Size: 57 MB - Last synced at: 18 days ago - Pushed at: 9 months ago - Stars: 264 - Forks: 148

guidok91/spark-structured-streaming-kafka

Spark Structured Streaming data pipeline that processes movie ratings data in real-time.

Language: Python - Size: 178 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 13 - Forks: 4

imjuliengaupin/sparkler

Language: Java - Size: 33.2 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

Mileristovski/DataEngineer-SparkStreaming

Track a Boat est un système de suivi maritime en temps réel utilisant Kafka, Spark Structured Streaming et WebSockets. Il permet de visualiser la position des navires, analyser leurs trajectoires et prévoir leurs destinations sur une carte interactive.

Language: Jupyter Notebook - Size: 9.59 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

chermenin/spark-states

Custom state store providers for Apache Spark

Language: Scala - Size: 267 KB - Last synced at: 18 days ago - Pushed at: 2 months ago - Stars: 92 - Forks: 26

tigureis/modelo_de_pipeline_com_spark_no_databricks

Este projeto simula um pipeline de streaming de dados utilizando Spark Structured Streaming e Delta Lake em um ambiente Databricks. O objetivo é demonstrar como processar dados em tempo real, mesmo quando a fonte de dados é estática e fornecida por meio de comandos SQL.

Language: Jupyter Notebook - Size: 10.7 KB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

dohabanoui/Spark-Structured-Streaming

Real-time analysis of hospital incident data using Apache Spark Streaming to track incidents by service and identify the top years with the most incidents.

Language: Java - Size: 857 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

qubole/kinesis-sql

Kinesis Connector for Structured Streaming

Language: Scala - Size: 251 KB - Last synced at: 15 days ago - Pushed at: 10 months ago - Stars: 136 - Forks: 80

tomaztk/Azure-Databricks

Azure Databricks - Advent of 2020 Blogposts

Language: Jupyter Notebook - Size: 44.9 MB - Last synced at: 20 days ago - Pushed at: over 2 years ago - Stars: 60 - Forks: 49

streamnative/awesome-pulsar

A curated list of Pulsar tools, integrations and resources.

Size: 11.7 KB - Last synced at: 11 days ago - Pushed at: over 4 years ago - Stars: 80 - Forks: 9

lupusruber/music_analytics

This project processes real-time music event data using Kafka, Apache Spark on Google Cloud Dataproc, and stores the transformed data in BigQuery for analytics, all orchestrated by Airflow and managed with Terraform.

Language: Jupyter Notebook - Size: 22.1 MB - Last synced at: 26 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

SimpleSoulll/ss-aof

spark structured streaming appending only file source based on datasource apiv2. Spark增量日志流式抓取

Language: Scala - Size: 33.2 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

pprzetacznik/datalake

Simple datalake

Language: Python - Size: 40 KB - Last synced at: 26 days ago - Pushed at: 10 months ago - Stars: 2 - Forks: 0

thedevd/techBlog

Examples of IT ruling technologies

Language: Scala - Size: 29.3 MB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 1

AbsaOSS/hyperdrive

Extensible streaming ingestion pipeline on top of Apache Spark

Language: Scala - Size: 1.62 MB - Last synced at: 14 days ago - Pushed at: about 1 year ago - Stars: 44 - Forks: 13

SayamAlt/PySpark-for-Big-Data-and-Machine-Learning

This is the material for Jose Portilla's Spark and Python for Big Data and ML course.

Language: Jupyter Notebook - Size: 3.79 MB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

ohmycloud/sub_trip_with_structured_spark_streaming

使用 Structured Spark Streaming 进行行程划分

Language: Scala - Size: 67.4 KB - Last synced at: 4 days ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

AmadeusITGroup/Elastic-Scaling

Elastic scaling is a library that allows to control the number of resources (executors or workers) instantiated by a Spark Structured Streaming Job in order to optimize the effective microbatch duration.

Language: Scala - Size: 32.2 KB - Last synced at: 19 days ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 1

roksolana-d/spark-streaming-examples

Research on legacy and structured streaming with Spark

Language: Scala - Size: 22.5 KB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 2 - Forks: 2

ramottamado/isabel

Spark Structured Streaming with Kafka Integration

Language: Python - Size: 14.6 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

aTechGuide/click-stream-analysis

Spark Structured Streaming App to aggregate data on rolling window of events (Not necessarily time)

Language: Scala - Size: 8.79 KB - Last synced at: 2 months ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

hadiezatpanah/Spark_Structured_Streaming_Java

In this solution, the issue of creating a table with case-sensitive columns (in the scenario where the table doesn't exist or when writing the table in overwrite mode) in Oracle has been addressed by developing a custom Oracle dialect and registering it.

Language: Java - Size: 385 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

tienlepham094/TwitterSparkStreaming

Twitter Streaming Project

Language: Python - Size: 202 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

martinKindall/NYC-Taxi-Limousine-Data-Spark

NYC Taxi & Limousine Commission's open data with Spark Streaming 3.0.0

Language: Scala - Size: 43.9 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 1

RosarioB/spark-streaming-kafka

Exploring Spark Structured Streaming features by making use of Jupiter notebooks, Pyspark and interacting with a Kafka cluster.

Size: 130 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ozancicek/artan

Online latent state estimation with Spark

Language: Scala - Size: 553 KB - Last synced at: 5 months ago - Pushed at: 6 months ago - Stars: 5 - Forks: 2

liupeirong/spark-structured-streaming-ci-cd

Spark structured streaming with unit tests integrated with Travis CI

Language: Scala - Size: 5.86 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 3

nama1arpit/reddit-streaming-pipeline

A real-time reddit data streaming pipeline for sentiment analysis of various subreddits

Language: HCL - Size: 15.1 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 51 - Forks: 2

jwsmai/ScalaTools

This project provides Apache Spark SQL, Flink DataStream API examples in Scala language

Language: Scala - Size: 3.19 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

fhuertas/uah-mbi-2019-streaming

Repositorio para la clase de UAM, Máster en Business Intelligence, PARALELIZACIÓN DE DATOS, Modulo de Streaming

Language: Jupyter Notebook - Size: 9.04 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 1

kklimexk/zio-playground

Playground for ZIO library

Language: Scala - Size: 16.6 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

NashTech-Labs/Sparkathon

A library having Java and Scala examples for Spark 2.x

Language: Java - Size: 113 MB - Last synced at: 18 days ago - Pushed at: over 8 years ago - Stars: 7 - Forks: 9

firecast/dhs-2019-demo

DataHack Summit 2019 demo files

Size: 33.7 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 1

Mark1002/sf-crime-statistics-spark-streaming

my udacity project

Language: Jupyter Notebook - Size: 1.69 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

garystafford/streaming-sales-generator

Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Python

Language: Python - Size: 9.28 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 28 - Forks: 11

hadiezatpanah/Spark_Java_Stateful

This project presents a distributable solution based on Spark Java, aiming to connect start and end session events together in a stateful manner. The project utilizes `flatMapGroupWithState`functionality which is a powerful feature for stateful stream processing in Spark. It enables you to maintain and update the state across batches.

Language: Java - Size: 95.7 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

hadiezatpanah/Spark_Java_MostValuableCustomers

This Spark Java project serves as a demonstration of Gradle Spark configuration, specifically focusing on utilizing the MemoryStream class as the streaming source.

Language: Java - Size: 65.4 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

AndrewKuzmin/spark-structured-streaming-examples

Spark structured streaming examples with using of version 3.4.0

Language: Scala - Size: 1.06 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 25 - Forks: 14

stephen29xie/tweet-streaming-data-pipeline

Real-time streaming data pipeline for Twitter Tweets

Language: Scala - Size: 301 KB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 11 - Forks: 9

Mathews-Tom/MSc-in-Machine-Learning-and-Artificial-Intelligence

Master of Science in Machine Learning & Artificial Intelligence - Indian Institute Technology Madras & Liverpool John Moores University

Language: Jupyter Notebook - Size: 2.12 GB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 6 - Forks: 7

xkondix/MsgBrokerSys

Spark Structured Streaming vs Kafka Streams

Language: Python - Size: 55.4 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

rajat2004/twitter-kafka

Twitter Web-App using Apache Kafka, Spark & perform analysis

Language: Python - Size: 29.3 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

Uriah372-DS/DDBMSPysparkProject

A course project with implementation of machine learning with spark structured streaming in python

Language: Jupyter Notebook - Size: 20.8 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

anmollp/Zootopia

A distributed streaming data processing pipeline.

Language: Python - Size: 1.15 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

b-b3rn4rd/terraform-provider-emrstreaming

The emrstreaming provider offers continuous deployment functionality for streaming steps into an EMR cluster.

Language: Go - Size: 116 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

vvittis/CCFD-RF

Credit Card Fraudulent Detection with Random Forest

Language: Java - Size: 4.49 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

vodkolav/DataEngineerProject

This is my final project for Data Engineer Expert course at Naya College.

Language: Jupyter Notebook - Size: 930 KB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

CloudComputingProject-2022/Data_visualization_and_analysis_tool_for_telemetry_data

An naive anomaly detection and data visualization tool for F1 on board telemetry data.

Language: Python - Size: 1.4 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 8 - Forks: 1

rajeshsantha/MonitoredStructuredStreaming

Repository for Spark structured streaming use case implementations.

Language: Scala - Size: 65.4 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 1

hoseinlook/cpu-anomaly-detection-with-spark

cpu anomaly detection with spark

Language: Python - Size: 333 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

falaybeg/SparkStreaming-Network-Anomaly-Detection

This repository includes supervised and unsupervised machine learning methods which are used to detect anomalies on network datasets. Decision Tree, Random Forest, Gradient Boost Tree, Naive Bayes, and Logistic Regression were used for supervised learning. K-Means was used for unsupervised learning.

Language: Jupyter Notebook - Size: 2.98 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 11 - Forks: 3

hadiezatpanah/Trending_Topic_Spark_Streaming_Scala

This is an End to End solution to read data from streaming source (kafka), extract different topic from data in each time window, calculating Hot Topics using a modified Z-Score Algorithm and storing Final Trend Topics in Postgres SQL Database

Language: Scala - Size: 6.47 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

jacobceles/ChicagoTaxiTrips-SparkStreaming-RealTimeDashboard

Analyzing Chicago taxi trips dataset using Spark Streaming, and a real-time dashboard for reporting using Flask.

Language: CSS - Size: 15.4 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 1

iomete/sql-streaming-sqs

Fork of the Apache Bahir sql-streaming-sqs, compatible with Spark 3

Language: Scala - Size: 25.4 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

bluejoe2008/spark-http-stream

spark structured streaming via HTTP communication

Language: Scala - Size: 207 KB - Last synced at: 24 days ago - Pushed at: almost 3 years ago - Stars: 18 - Forks: 10

dharaneeshvrd/spark-examples

Spark Examples

Language: Python - Size: 35.2 KB - Last synced at: 12 days ago - Pushed at: almost 4 years ago - Stars: 2 - Forks: 5

ArmanShakeri/Pyspark-upsert-oracle

Pyspark sample for upsert data to oracle table

Language: Python - Size: 23.4 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

renardeinside/spark-streaming-state-store-example

Spark Structured Streaming with State Store

Language: Scala - Size: 26.4 KB - Last synced at: about 11 hours ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 3

iomete/kafka-streaming-job

Kafka streaming job from iomete. This streaming job copies data from Kafka to Iceberg.

Language: Python - Size: 383 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

hosnaa/Apache-Spark-Streaming-Analysis

Analysis for a streaming daily retail data using Spark structured streaming and querying this data to get insights

Language: HTML - Size: 57.6 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

Thelin90/deiteo

P.O.C Spark On Kubernetes

Language: Shell - Size: 1.21 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 2

mpfishe2/eventhubs-databricks-quickstart

Get up and running quickly with Spark Structured Streaming on Azure Databricks using Azure Event Hubs

Language: Scala - Size: 3.91 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

PierreVerbe/Scala-Spark-Template

🛠️ Template to do data processing with Scala and Apache Spark ✨

Language: Scala - Size: 137 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

michelheil/BigData

Projects related to Big Data technologies

Language: Java - Size: 2.24 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

AndrewKuzmin/Analytics-For-IoT-Devices-Using-Spark

Analytics for IoT devices using Apache Spark Structured Streaming 2.4.0

Language: Scala - Size: 1.03 MB - Last synced at: almost 2 years ago - Pushed at: about 6 years ago - Stars: 5 - Forks: 1

RooseveltAdvisors/spark_structured_streaming_demo

A Log Analytics demo based on Spark Structured Streaming + Kafka

Language: Python - Size: 1.41 MB - Last synced at: 4 months ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 3

PLarboulette/spark-structured-streaming

Language: Scala - Size: 22.5 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

sunujh6/spark_practice

Language: Jupyter Notebook - Size: 1.62 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

JulienPeloton/mini_spark_broker

Design and proof-of-concept for a Broker for astronomy using Apache Spark

Language: Jupyter Notebook - Size: 8.98 MB - Last synced at: 14 days ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 2

LuckyZXL2016/Spark-Example

Spark1.6和spark2.2的示例,包含kafka,flume,structuredstreaming,jedis,elasticsearch,mysql,dataframe

Language: Scala - Size: 2.06 MB - Last synced at: 19 days ago - Pushed at: about 7 years ago - Stars: 15 - Forks: 6

aTechGuide/spark-streaming

Spark Streaming Scripts and integrations with other technologies

Language: TSQL - Size: 32.4 MB - Last synced at: 2 months ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

aqib1/java-spark-structured-streaming

Language: Java - Size: 3.91 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

aqib1/spark-structured-streaming-java

Language: Java - Size: 21.5 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

greysap/microbatch2cassandra

Language: Java - Size: 8.79 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 1

conker84/kafka-rome-june-2k19

Size: 35 MB - Last synced at: 18 days ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

haozhang-x/log-analysis-spark

Structured Streaming Log Analysis

Language: Scala - Size: 72.3 KB - Last synced at: almost 2 years ago - Pushed at: almost 6 years ago - Stars: 2 - Forks: 2

AndrewKuzmin/spark-ml-pipelines-with-structured-streaming-examples

Examples of using Apache Spark MLlib Pipelines and Structured Streaming on version 2.4.0

Language: Shell - Size: 1020 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

ramkashyap-s/Live-Dash

Stream processing pipeline for analyzing live chat data

Language: Python - Size: 5 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

kthristov/cubos-olap

Language: Scala - Size: 24.4 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

sergei-grigorev/spark-streaming-project

In-Stream final project

Language: Scala - Size: 107 KB - Last synced at: almost 2 years ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 1

SevakAvet/gridu-spark-streaming

Study project, apache kafka + apache spark

Language: Scala - Size: 19.5 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 1

tspannhw/nifi-spark-structuredstreaming

Language: Scala - Size: 5.86 KB - Last synced at: about 1 year ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 0