GitHub / Undisputed-jay / Streaming-Data-from-Reddit-Using-Kafka-Spark-and-MongoDB
A data pipeline that streams Reddit comments from the 'Politics' subreddit using Kafka and Apache Spark. Processed data is stored in MongoDB for real-time analysis and management.
Stars: 0
Forks: 0
Open issues: 0
License: None
Language: Python
Size: 399 MB
Dependencies parsed at: Pending
Created at: 6 months ago
Updated at: 6 months ago
Pushed at: 6 months ago
Last synced at: about 2 months ago
Topics: apache-spark, big-data, data-engineering, etl-pipeline, kafka, mongodb, mongodb-atlas, pyspark, real-time-streaming, redditapi, streaming-analytics