An open API service providing repository metadata for many open source software ecosystems.

GitHub / divithraju / divith-raju-pipeline-hadoop-pyspark

This project presents a comprehensive data pipeline designed to predict customer churn using historical customer data. By leveraging Hadoop and PySpark, this pipeline efficiently processes large datasets, performs feature engineering, and trains a machine learning model to identify customers at risk of leaving.

JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/divithraju%2Fdivith-raju-pipeline-hadoop-pyspark
PURL: pkg:github/divithraju/divith-raju-pipeline-hadoop-pyspark

Stars: 1
Forks: 0
Open issues: 0

License: None
Language: Python
Size: 4.88 KB
Dependencies parsed at: Pending

Created at: 11 months ago
Updated at: 11 months ago
Pushed at: 11 months ago
Last synced at: 11 months ago

Topics: apache, bigdata, data, database, dataengineering, hadoop, hadoop-hdfs, linux, open-source, pipeline, project, project-repository, pyspark, pyspark-mllib, pyspark-python, python3, software-engineering, ubuntu

    Loading...