An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: vectorassembler

CamilaJaviera91/pyspark-first-approach

This code demonstrates how to integrate PySpark with datasets and perform simple data transformations. It loads a sample dataset using PySpark's built-in functionalities or reads data from external sources and converts it into a PySpark DataFrame for distributed processing and manipulation.

Language: Python - Size: 2.72 MB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

arnoldchrisoduor1/LinearRegression-Model-with-ApacheSpark-and-DataBricks

Using Apache pySpark on DataBricks, I was able to do feature Engineering on Customer Data, trained and used a Linear Regression Model to predict their bill based on previous customer trends.

Language: Jupyter Notebook - Size: 3.91 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

mauryashobhit/cruise_ship_member_prediction

predicting number of crew memebers on a ship based on multiple parameters

Language: Jupyter Notebook - Size: 22.5 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0