GitHub topics: pyspark-mllib

Repositories

Hippaho/Sparkify

A music streaming company, Sparkify, has decided that it is time to introduce more automation and monitoring to their data warehouse ETL pipelines and come to the conclusion that the best tool to achieve this is Apache Airflow.

Language: Python - Size: 17.6 KB - Last synced at: about 6 hours ago - Pushed at: about 7 hours ago - Stars: 0 - Forks: 0

jibbs1703/Classic-ML-Models

This repository contains scripts for developing, training and evaluating machine learning models using several python frameworks.

Language: Jupyter Notebook - Size: 3.31 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 1 - Forks: 0

Anirudh-rao/Machine-Learning-Pyspark

This repository covers all the basics of using spark for machine learning

Language: Jupyter Notebook - Size: 4.66 MB - Last synced at: 20 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

mihirchhiber/Network-Intrusion-Detector

Network Intrusion Detector is a distributed intrusion detection system built with PySpark. It preprocesses, encodes, and models network traffic data to detect anomalies using a Random Forest classifier, achieving high accuracy and efficiency through feature selection and scalable data processing. The system is suitable for large-scale environments

Language: Jupyter Notebook - Size: 860 KB - Last synced at: 13 days ago - Pushed at: 26 days ago - Stars: 0 - Forks: 0

Wb-az/MLib-PySpark-SoundLevel-Prediction

Creates a ML Pipeline leveraging PySpark SQL and PySpark MLib to predict sound level

Language: Jupyter Notebook - Size: 972 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

asif7adil/scSPARKL

scSPARKL is an Apache spark based pipeline for performing variety of preprocessing and downstream analysis of scRNA-seq data.

Language: Python - Size: 94.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 5 - Forks: 1

vigneshSs-07/Pyspark-ACompleteGuide

This repo explains pyspark modules in python. Used to deal with big data more practical handson.

Language: Jupyter Notebook - Size: 1.86 MB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 3

JaewonSon37/Mining_Big_Data1

Language: Jupyter Notebook - Size: 30.3 KB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

imsanjoykb/PySpark-Bootcamp

My Practice and project on PySpark

Language: Jupyter Notebook - Size: 4.52 MB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 8 - Forks: 3

Sarthak-1408/PySpark-Tutorial

In this Repo, I create a tutorial of PySpark to better understand how to read and manage Big Data.

Language: Jupyter Notebook - Size: 46.9 KB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 6 - Forks: 6

shivensharma01/QST1-Cinema-Insights

Language: Jupyter Notebook - Size: 1.03 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

mohammadreza-mohammadi94/PySpark-Analytics-Hub

A PySpark repository for data analysis, machine learning projects, and hands-on exercises. Explore scalable data processing and advanced ML workflows with Spark.

Language: Jupyter Notebook - Size: 12.6 MB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 1

mananabbasi/Data-Science-Complete-Project-using-Big-Data-Tools-Techniques-

This repository contains Databricks projects utilizing RDDs, DataFrames, and SQL to process and analyze various real-world datasets. Data cleaning and analysis have been performed using PySpark functions to handle challenges such as inconsistent formats, missing values, and complex data structures. The project ensures efficient data transformation

Language: HTML - Size: 3.71 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

neemiasbsilva/case-study-data-science

Welcome to some case study of data science projects - (Personal Projects).

Language: Jupyter Notebook - Size: 11.1 MB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 16 - Forks: 4

SayamAlt/TMDB-Movies-End-to-End-ETL-and-ML-Pipeline

This project encompasses end-to-end ETL and ML pipeline development. Data ingestion from TMDB API covered top-rated, current, upcoming, and popular movies with genres. Performed EDA to derive several valuable insights and observations. Developed a regression model with 97% r2 score to predict average movie ratings accurately.

Language: Python - Size: 15.6 KB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Redgerd/Reddit-Post-Analysis-Workflow

This Reddit Post Analysis Workflow collects and processes Reddit data using Apache Spark and Delta Lake. It transforms raw data, applies sentiment analysis, and extracts TF-IDF features. The pipeline ensures reliable, high-quality data storage and supports continuous analytics.

Language: HTML - Size: 193 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 0 - Forks: 1

autodeployai/pypmml-spark

Python PMML scoring library for PySpark as SparkML Transformer

Language: Python - Size: 42.7 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 22 - Forks: 2

TravelXML/APACHE-SPARK-PYSPARK-DATABRICKS

APACHE SPARK: Data Analysis, Transformation, and Visualisation with PySpark, IPL Data Analysis

Language: Jupyter Notebook - Size: 2.25 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 1 - Forks: 1

DebanjanSarkar/pyspark-maestro

This repo contains implementations of PySpark for real-world use cases for batch data processing, streaming data processing sourced from Kafka, sockets, etc., spark optimizations, business specific bigdata processing scenario solutions, and machine learning use cases.

Language: Jupyter Notebook - Size: 66.1 MB - Last synced at: 2 months ago - Pushed at: 11 months ago - Stars: 2 - Forks: 1

vuthanhhai2302/Applied-Pyspark

My applied big data analytic project with pyspark.

Language: Jupyter Notebook - Size: 1.15 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 0

miquido/DataScience

Useful scripts and notebooks for Data Science. The project was made by Miquido. https://www.miquido.com/

Language: Jupyter Notebook - Size: 130 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 3

storytellingengineer/Introduction_to_Pyspark

PySpark Implementation and methods

Language: Jupyter Notebook - Size: 10.7 KB - Last synced at: 6 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

dwija12903/bda-lab

This repository contains various lab files from my Big Data Analytics coursework

Language: Jupyter Notebook - Size: 267 KB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

divithraju/divith-raju-pipeline-hadoop-pyspark

This project presents a comprehensive data pipeline designed to predict customer churn using historical customer data. By leveraging Hadoop and PySpark, this pipeline efficiently processes large datasets, performs feature engineering, and trains a machine learning model to identify customers at risk of leaving.

Language: Python - Size: 4.88 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

pregismond/coursera-diabetes-prediction

Diabetes Prediction Using PySpark MLlib

Language: Jupyter Notebook - Size: 1.28 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

srosalino/Predicting_NYC_Taxi_Limousine_Profit

Predicting the profit of NYC Taxi Limousine services to provide actionable insights for maximizing revenue

Language: Jupyter Notebook - Size: 712 KB - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

srosalino/Six_Degrees_of_Separation_and_Engineering_the_Perfect_Cast

Leveraging PySpark to analyze the IMDB database, answer various queries, and develop machine learning models to predict a movie's popularity based on its cast

Language: Jupyter Notebook - Size: 140 KB - Last synced at: 4 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

AzlinRusnan/Iris_PySpark_Analysis

Iris Classification using PySpark

Language: Jupyter Notebook - Size: 319 KB - Last synced at: 18 days ago - Pushed at: 12 months ago - Stars: 1 - Forks: 0

burhanahmed1/Iris-Dataset-Analysis-with-PySpark

Implementation of K-means,Bisecting K-means and Decision Tree in PySpark on the Iris Dataset.

Language: Jupyter Notebook - Size: 146 KB - Last synced at: 4 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

VQHieu1012/Book-Recommendation-PySpark

Using PySpark Mlib and ALS model to create book recommendation

Language: Jupyter Notebook - Size: 58 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 1

Abhishake-Patel/Process-Data-Analytics

PySpark and Python ML and Data Science Projects on a variety of Topics

Language: Jupyter Notebook - Size: 27.3 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

OmarNouih/Twitter-Streams

Real-Time Sentiment Analysis on Twitter Streams is a web application that categorizes tweets into sentiments like Negative, Positive, Neutral, or Irrelevant. Built using Apache Kafka , Spark and PySpark ML models, it offers real-time analysis capabilities.

Language: Python - Size: 3.33 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

saahilk1511/Book-Recommender-System

The goal of this project was build recommender systems using K-means and ALS based on the average ratings. It recommends similar books, recommends author based on a book title, recommends high rated books of the author.

Language: Jupyter Notebook - Size: 414 KB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

mathewsrc/machine-learning-monitoring-with-evidently

ML Monitoring with EvidentlyAI

Language: Jupyter Notebook - Size: 23.1 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

naiborhujosua/Telco_Churn_Analysis

Implementing the Customer Churn Analysis in Telco Industry to improving Customer retention using Pyspark in Databricks

Size: 856 KB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

aakinlalu/Crime-Classification-using-PySpark

classify crime into different categories using PySpark

Language: Jupyter Notebook - Size: 311 KB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 19 - Forks: 14

Chan2k20/Wine-Prediction-Prediction-Model-On-AWS-EMR

Implemented random forest machine learning algorithm using pyspark on AWS EMR to classify the wines. The model is then deployed in docker container.

Language: Python - Size: 120 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 1

Lefteris-Souflas/Spark-Movies-Analytics

Utilizing Apache Spark & PySpark to analyze a movie dataset. Tasks include data exploration, identifying top-rated movies, training a linear regression model, and experimenting with Airflow.

Language: Jupyter Notebook - Size: 289 KB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

dimdasci/yp11-pyspark-training

Training project with Spark DataFrame and MLlib

Language: Jupyter Notebook - Size: 765 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

vt57299/Pyspark_Tutorial

Pyspark Tutorial on azure databricks

Language: Jupyter Notebook - Size: 18.6 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Safaa-p/Machine-Failure-Prediction

Predicting Machine failure using Machine learning on a synthetic dataset of an existing milling machine consisting of 10,000 data points

Language: Jupyter Notebook - Size: 4.7 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Prajwal10031999/Song-Genre-Classification-in-PySparks-MLlib

A PySpark MLlib classification model to classify songs based on a number of characteristics into a set of 23 electronic genres.

Language: Jupyter Notebook - Size: 1.56 MB - Last synced at: 8 days ago - Pushed at: over 4 years ago - Stars: 6 - Forks: 2

vijay06/Recommended_System-

Language: Jupyter Notebook - Size: 26.4 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

Sanjayvk98/Employee-Atrrition-PySpark-MLlib-

Machine Learning using Pyspark

Language: Jupyter Notebook - Size: 165 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

biagiom/spark-network-traffic-classifier

Network traffic classifier based on Apache Spark and MLlib

Language: Python - Size: 1.26 MB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 4 - Forks: 1

abroniewski/IdleCompute-Data-Management-Architecture

Implementation of a big data management and analysis backbone architecture using PySpark for distributed and scalable data ingestion and MLlib for machine learning analysis. Part of Big Data Management and Analytics (BDMA) program.

Language: Jupyter Notebook - Size: 34.8 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 1

AaronOS0/Bitcoin-Price-Prediction-PySpark

Bitcoin Price Prediction using Spark Global and self-designed Local Model with Big data preprocessing and manipulation solution.

Language: Jupyter Notebook - Size: 2.16 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 2

titicaca/spark-iforest

Isolation Forest on Spark

Language: Scala - Size: 74.2 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 221 - Forks: 91

MattithyahuData/P10-Bank-Note-Authentication

💷 Machine Learning PySpark Bank note authentication

Language: Jupyter Notebook - Size: 596 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

ghanmi-hamza/Machine-learning-with-PySpark

This notebook contains the usage of Pyspark to build machine learning classifiers (note that almost ml_algorithm supported by Pyspark are used in this notebook)

Language: Jupyter Notebook - Size: 109 KB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

ksashok/Movie-Recommendation-PySpark

Movie Recommendation using Apache Spark MLlib

Language: Python - Size: 1.95 KB - Last synced at: almost 2 years ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 1

yogeshwaran-shanmuganathan/Success-Prediction-Analysis-for-Startups

Analysis of information about startup companies done using machine learning and data analytics methods to predict the success of the startup companies.

Language: Jupyter Notebook - Size: 15.1 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

AntonioLunardi/NLP_Spark_sentiment_analisys

A bag of words analisys based on IMDB movie opinions with PySpark

Language: Jupyter Notebook - Size: 338 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

ninasiam/housing_data_analysis

A collection of small data science projects to predict house pricing for two different datasets

Language: Jupyter Notebook - Size: 4.42 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

imratnesh/pyspark

Pyspark, machine learning, python

Language: HTML - Size: 123 KB - Last synced at: about 1 month ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 1

ahkhaniki/spark-machine-learning

Language: Jupyter Notebook - Size: 18.6 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Abdelrahman13-coder/PySpark

Language: Jupyter Notebook - Size: 48 MB - Last synced at: 16 days ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

nisaharan/Medical-insurance-charges-prediction

Modelled the Medical insurance charges with the help of distributed computing platform Pyspark in Databricks. Used 2 models for this purpose. Linear Regression Logistic regression

Language: Jupyter Notebook - Size: 123 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

avimonda298/Spark-ML

Worked on diffrent Spark classification and regression algorithms

Language: Jupyter Notebook - Size: 1.02 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

aehabV/Cement-Strength-Prediction-with-PySpark

A machine learning model that predicts the strength of cement based on its ingredients using PySpark's MLlib library.

Language: Jupyter Notebook - Size: 24.4 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

aehabV/Indeed-fake-job-posting-prediction

A machine learning model is built using PySpark's MLlib library to automatically flag suspicious job postings on Indeed.com. The dataset includes 18,000 job descriptions, out of which about 800 are fake.

Language: Jupyter Notebook - Size: 25.9 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

manyuzhang1996/Consumer-Churn-Prediction-with-PySpark

Big Data Project

Language: Jupyter Notebook - Size: 4.86 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

JuanEnD/InsightPlaces

Projeto da 2º Edição do Challenge Data Science da Alura, onde utilizei o PySpark para analisar e tratar os dados dos preços de imóveis do Rio de Janeiro.

Language: Jupyter Notebook - Size: 25.8 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

lbdeoliveira/song-playlist-recommendation

This project was a joint effort by Lucas De Oliveira, Chandrish Ambati, and Anish Mukherjee to create a song and playlist embeddings for recommendations in a distributed fashion using a 1M playlist dataset by Spotify.

Language: HTML - Size: 225 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 32 - Forks: 12

yvgupta03/Big_Data_Project_US-Airlines_Tweet_Processing_and_Analysis

Big data application of Machine Learning concepts for sentiment classification of US Airlines tweets. The focus is on the usage of pyspark libraries (ml-lib) on big data to solve a problem using Machine Learning algorithms and not about the choice of algorithm used in the ML model creation. It also involves data pre-processing using NLP techniques, cross-validation and parameter-grid builder.

Language: Jupyter Notebook - Size: 1.83 MB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

sablokgaurav/data_engineering

java_codes

Language: Java - Size: 1.95 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Uriah372-DS/DDBMSPysparkProject

A course project with implementation of machine learning with spark structured streaming in python

Language: Jupyter Notebook - Size: 20.8 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

tharikf/PySpark_KingCounty

Language: Jupyter Notebook - Size: 13.7 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

AbdelmajidLh/ML_diabet_predict_pyspark

Prédiction du diabète par régression logistique avec Python et PySpark

Language: Jupyter Notebook - Size: 10.7 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

dunnkers/pyspark-bucketmap

Easily group pyspark data into buckets and map them to different values.

Language: Jupyter Notebook - Size: 56.6 KB - Last synced at: 17 days ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

mohanakrishnavh/PySpark-Tutorial

Language: Jupyter Notebook - Size: 2.87 MB - Last synced at: over 2 years ago - Pushed at: about 7 years ago - Stars: 17 - Forks: 19

AntonioLunardi/Challenge-Data-Science-Alura-2ed Fork of millenagena/Challenge-Data-Science-Alura-2ed

Projeto de engenharia e ciência de dados da imobiliária InsightPlaces utilizando tecnologias de big data. Implementação de modelos de machine learning de regressão e agrupamento (clusterização).

Language: Jupyter Notebook - Size: 26.4 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

animenon/pyspark_mllib

Example from Spark MLLib (in python)

Language: Python - Size: 23.4 KB - Last synced at: over 2 years ago - Pushed at: over 7 years ago - Stars: 9 - Forks: 6

aziz0519/sparkml-model-deployment

End-to-end prediction model development using PySpark with Docker and Streamlit

Language: Python - Size: 594 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

sujan-bala/Machine-Learning-Notebooks

Various Jupyter Notebooks containing ML projects

Language: Jupyter Notebook - Size: 19.3 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

iqrabismii/Big-Data-Projects-

Projects on Big Data Using Pyspark and AWS

Language: Jupyter Notebook - Size: 2.05 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

gabridego/spark-exercises

A collection of pyspark exercises

Language: Python - Size: 211 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 2

brunowdev/sparkify

This is the final project for the Data Scientist Nanodegree, where our goal is to predict churn for a fictional streaming service called Sparkify.

Language: HTML - Size: 6.33 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 0

Foroozani/BigData_PySpark

:bangbang: Handle Big Data for Machine Learning using Python and PySpark, Building ETL Pipelines with PySpark, MongoDB, and Bokeh

Language: Jupyter Notebook - Size: 35.1 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 6 - Forks: 4

hajarmerbouh/Pyspark_classification

Classification using Pyspark

Language: Jupyter Notebook - Size: 7.8 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

GabrielAraujoCarlos/Prevendo-Satisfacao-Cliente-Santander

Projeto de Machine Learning no Santander - Prevendo Nível de Satisfação do Cliente

Language: Jupyter Notebook - Size: 7.88 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

ravichoudharyds/Pyspark_Recommendation_System

Recommendation System using MLlib and ML libraries on Pyspark

Language: Python - Size: 137 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 4 - Forks: 1

alimunawar007/Network_Intrusion_Detection

Network Intrusion Detection using pyspark

Language: Jupyter Notebook - Size: 3.62 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

stevenlimcorn/australian-weather-prediction

A docker hosted australian weather prediction analysis with PySpark and Hadoop DFS

Language: Jupyter Notebook - Size: 12.3 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

matheusmmmp/MLlib-graphTracking

Tracking project with machine learning using pyspark mllib

Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

limz1986/PySpark-ML-Model-DataBricks

An introduction to PySpark, Creating a simple multi regression ML model and hosting it on a databricks cluster

Language: Python - Size: 7.81 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

matheusmmmp/MLlib-movieRecommendation

Movie recommendation project with machine learning using pyspark mllib

Language: Jupyter Notebook - Size: 921 KB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

toby-p/pyspark-flight-delay-prediction

Final project from "Machine Learning at Scale" (W261) in UC Berkeley's Data Science Masters program

Language: Jupyter Notebook - Size: 8.4 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

Reucherian/insure-health-scale

doing health insurance analytics and prediction at scale with pyspark.

Language: Jupyter Notebook - Size: 1.48 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

kabbina/Big-Data Fork of rohanmrb/Big-Data

Language: Python - Size: 29 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

yvgupta03/Big_Data_Project_Page-Ranking_Airports

PySpark code to implement page ranking algorithm on airports dataset to highlight the relative importance of the airports according to the dataset.

Language: Jupyter Notebook - Size: 365 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 1

apurva-modi/pyspark-twitter-sentimental-analysis

To Analyze how travelers expressed their feelings on Twitter using pyspark MLlib .Given tweets about six US airlines, the task is to predict whether a tweet contains positive, negative, or neutral sentiment about the airline. This is a typical supervised learning task where given a text string, I have to categorize the text string into predefined categories.

Language: Jupyter Notebook - Size: 406 KB - Last synced at: 3 months ago - Pushed at: about 5 years ago - Stars: 3 - Forks: 1

Related Keywords

pyspark-mllib 119 pyspark 81 machine-learning 35 python 28 spark 21 pyspark-notebook 21 python3 13 pyspark-machine-learning 13 pyspark-python 12 pyspark-tutorial 11 data-science 10 big-data 10 apache-spark 10 mllib 7 databricks 7 logistic-regression 7 hadoop 6 jupyter-notebook 6 pyspark-sql 6 databricks-notebooks 6 pandas 5 spark-sql 5 random-forest 5 bigdata 5 apache 4 pipeline 4 pyspark-ml 4 spark-streaming 4 decision-tree 4 nlp 4 aws-s3 4 sql 4 classification 4 linear-regression 4 nlp-machine-learning 3 data-visualization 3 churn-prediction 3 vitrinedev 3 recommender-system 3 gradient-boosting 3 clustering 3 kafka 3 deep-learning 3 feature-engineering 3 data-engineering 3 anomaly-detection 3 mlflow 3 hadoop-hdfs 3 big-data-analytics 3 matplotlib 3 seaborn 3 pagerank 2 neural-network 2 kmeans-clustering 2 regression-models 2 extract-transform-load 2 alternating-least-squares 2 exploratory-data-analysis 2 docker-compose 2 visualization 2 data-transformation 2 azure-databricks 2 spark-ml 2 kmeans 2 json 2 graphframes 2 kafka-streams 2 als 2 docker 2 gcp 2 graphx 2 recommendation-system 2 google-colab 2 pca 2 apache-airflow 2 sentiment-analysis 2 scala 2 hdfs 2 etl-pipeline 2 hadoop-mapreduce 2 data-preprocessing 2 parquet 2 sparkml 2 predictive-modeling 2 h2oai 2 spotify 2 scikit-learn 2 dataframe 2 airflow 2 aws 2 h2o-automl 2 athena 2 rdd 2 ml 2 sparkjava 1 sqoop 1 clustring 1 machine-learning-algorithms 1 java 1 functional-programming 1