GitHub / VladimirZelenokor1 / Big-Data-Project---Predicting-Trip-Fares-with-Spark-Hive
A CRISP-DM–based big data pipeline for predicting NYC ride-sharing trip fares: ingesting 2024 TLC data via Sqoop into HDFS/Hive, performing ETL and feature engineering with Spark & PySpark, training and tuning Linear Regression & Gradient Boosted Tree models, and outlining end-to-end deployment.
JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VladimirZelenokor1%2FBig-Data-Project---Predicting-Trip-Fares-with-Spark-Hive
PURL: pkg:github/VladimirZelenokor1/Big-Data-Project---Predicting-Trip-Fares-with-Spark-Hive
Stars: 1
Forks: 0
Open issues: 0
License: None
Language: Java
Size: 906 KB
Dependencies parsed at: Pending
Created at: about 2 months ago
Updated at: about 2 months ago
Pushed at: about 2 months ago
Last synced at: 24 days ago
Topics: big-data, data-engineering, etl, hadoop, hive, jupyter-notebook, machine-learning, predictive-modeling, pyspark, python, spark, spark-ml, sql, sqoop