GitHub / Redgerd / Reddit-Post-Analysis-Workflow
This Reddit Post Analysis Workflow collects and processes Reddit data using Apache Spark and Delta Lake. It transforms raw data, applies sentiment analysis, and extracts TF-IDF features. The pipeline ensures reliable, high-quality data storage and supports continuous analytics.
JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Redgerd%2FReddit-Post-Analysis-Workflow
PURL: pkg:github/Redgerd/Reddit-Post-Analysis-Workflow
Stars: 0
Forks: 1
Open issues: 0
License: None
Language: HTML
Size: 193 KB
Dependencies parsed at: Pending
Created at: 10 months ago
Updated at: 9 months ago
Pushed at: 9 months ago
Last synced at: about 1 month ago
Topics: azure-databricks, pyspark, pyspark-mllib