Streaming-Data-from-Reddit-Using-Kafka-Spark-and-MongoDB

A data pipeline that streams Reddit comments from the 'Politics' subreddit using Kafka and Apache Spark. Processed data is stored in MongoDB for real-time analysis and management.

JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Undisputed-jay%2FStreaming-Data-from-Reddit-Using-Kafka-Spark-and-MongoDB

Stars: 0
Forks: 0
Open issues: 0

License: None
Language: Python
Size: 399 MB
Dependencies parsed at: Pending

Created at: 6 months ago
Updated at: 6 months ago
Pushed at: 6 months ago
Last synced at: about 2 months ago

Topics: apache-spark, big-data, data-engineering, etl-pipeline, kafka, mongodb, mongodb-atlas, pyspark, real-time-streaming, redditapi, streaming-analytics

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos

GitHub / Undisputed-jay / Streaming-Data-from-Reddit-Using-Kafka-Spark-and-MongoDB