GitHub topics: pyspark-mllib
Hippaho/Sparkify
A music streaming company, Sparkify, has decided that it is time to introduce more automation and monitoring to their data warehouse ETL pipelines and come to the conclusion that the best tool to achieve this is Apache Airflow.
Language: Python - Size: 17.6 KB - Last synced at: about 6 hours ago - Pushed at: about 7 hours ago - Stars: 0 - Forks: 0

jibbs1703/Classic-ML-Models
This repository contains scripts for developing, training and evaluating machine learning models using several python frameworks.
Language: Jupyter Notebook - Size: 3.31 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 1 - Forks: 0

Anirudh-rao/Machine-Learning-Pyspark
This repository covers all the basics of using spark for machine learning
Language: Jupyter Notebook - Size: 4.66 MB - Last synced at: 20 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

mihirchhiber/Network-Intrusion-Detector
Network Intrusion Detector is a distributed intrusion detection system built with PySpark. It preprocesses, encodes, and models network traffic data to detect anomalies using a Random Forest classifier, achieving high accuracy and efficiency through feature selection and scalable data processing. The system is suitable for large-scale environments
Language: Jupyter Notebook - Size: 860 KB - Last synced at: 13 days ago - Pushed at: 26 days ago - Stars: 0 - Forks: 0

Wb-az/MLib-PySpark-SoundLevel-Prediction
Creates a ML Pipeline leveraging PySpark SQL and PySpark MLib to predict sound level
Language: Jupyter Notebook - Size: 972 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

asif7adil/scSPARKL
scSPARKL is an Apache spark based pipeline for performing variety of preprocessing and downstream analysis of scRNA-seq data.
Language: Python - Size: 94.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 5 - Forks: 1

vigneshSs-07/Pyspark-ACompleteGuide
This repo explains pyspark modules in python. Used to deal with big data more practical handson.
Language: Jupyter Notebook - Size: 1.86 MB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 3

JaewonSon37/Mining_Big_Data1
Language: Jupyter Notebook - Size: 30.3 KB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

imsanjoykb/PySpark-Bootcamp
My Practice and project on PySpark
Language: Jupyter Notebook - Size: 4.52 MB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 8 - Forks: 3

Sarthak-1408/PySpark-Tutorial
In this Repo, I create a tutorial of PySpark to better understand how to read and manage Big Data.
Language: Jupyter Notebook - Size: 46.9 KB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 6 - Forks: 6

shivensharma01/QST1-Cinema-Insights
Language: Jupyter Notebook - Size: 1.03 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

mohammadreza-mohammadi94/PySpark-Analytics-Hub
A PySpark repository for data analysis, machine learning projects, and hands-on exercises. Explore scalable data processing and advanced ML workflows with Spark.
Language: Jupyter Notebook - Size: 12.6 MB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 1

mananabbasi/Data-Science-Complete-Project-using-Big-Data-Tools-Techniques-
This repository contains Databricks projects utilizing RDDs, DataFrames, and SQL to process and analyze various real-world datasets. Data cleaning and analysis have been performed using PySpark functions to handle challenges such as inconsistent formats, missing values, and complex data structures. The project ensures efficient data transformation
Language: HTML - Size: 3.71 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

neemiasbsilva/case-study-data-science
Welcome to some case study of data science projects - (Personal Projects).
Language: Jupyter Notebook - Size: 11.1 MB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 16 - Forks: 4

SayamAlt/TMDB-Movies-End-to-End-ETL-and-ML-Pipeline
This project encompasses end-to-end ETL and ML pipeline development. Data ingestion from TMDB API covered top-rated, current, upcoming, and popular movies with genres. Performed EDA to derive several valuable insights and observations. Developed a regression model with 97% r2 score to predict average movie ratings accurately.
Language: Python - Size: 15.6 KB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Redgerd/Reddit-Post-Analysis-Workflow
This Reddit Post Analysis Workflow collects and processes Reddit data using Apache Spark and Delta Lake. It transforms raw data, applies sentiment analysis, and extracts TF-IDF features. The pipeline ensures reliable, high-quality data storage and supports continuous analytics.
Language: HTML - Size: 193 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 0 - Forks: 1

autodeployai/pypmml-spark
Python PMML scoring library for PySpark as SparkML Transformer
Language: Python - Size: 42.7 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 22 - Forks: 2

TravelXML/APACHE-SPARK-PYSPARK-DATABRICKS
APACHE SPARK: Data Analysis, Transformation, and Visualisation with PySpark, IPL Data Analysis
Language: Jupyter Notebook - Size: 2.25 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 1 - Forks: 1

DebanjanSarkar/pyspark-maestro
This repo contains implementations of PySpark for real-world use cases for batch data processing, streaming data processing sourced from Kafka, sockets, etc., spark optimizations, business specific bigdata processing scenario solutions, and machine learning use cases.
Language: Jupyter Notebook - Size: 66.1 MB - Last synced at: 2 months ago - Pushed at: 11 months ago - Stars: 2 - Forks: 1

vuthanhhai2302/Applied-Pyspark
My applied big data analytic project with pyspark.
Language: Jupyter Notebook - Size: 1.15 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 0

miquido/DataScience
Useful scripts and notebooks for Data Science. The project was made by Miquido. https://www.miquido.com/
Language: Jupyter Notebook - Size: 130 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 3

storytellingengineer/Introduction_to_Pyspark
PySpark Implementation and methods
Language: Jupyter Notebook - Size: 10.7 KB - Last synced at: 6 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

dwija12903/bda-lab
This repository contains various lab files from my Big Data Analytics coursework
Language: Jupyter Notebook - Size: 267 KB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

divithraju/divith-raju-pipeline-hadoop-pyspark
This project presents a comprehensive data pipeline designed to predict customer churn using historical customer data. By leveraging Hadoop and PySpark, this pipeline efficiently processes large datasets, performs feature engineering, and trains a machine learning model to identify customers at risk of leaving.
Language: Python - Size: 4.88 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

pregismond/coursera-diabetes-prediction
Diabetes Prediction Using PySpark MLlib
Language: Jupyter Notebook - Size: 1.28 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

srosalino/Predicting_NYC_Taxi_Limousine_Profit
Predicting the profit of NYC Taxi Limousine services to provide actionable insights for maximizing revenue
Language: Jupyter Notebook - Size: 712 KB - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

srosalino/Six_Degrees_of_Separation_and_Engineering_the_Perfect_Cast
Leveraging PySpark to analyze the IMDB database, answer various queries, and develop machine learning models to predict a movie's popularity based on its cast
Language: Jupyter Notebook - Size: 140 KB - Last synced at: 4 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

AzlinRusnan/Iris_PySpark_Analysis
Iris Classification using PySpark
Language: Jupyter Notebook - Size: 319 KB - Last synced at: 18 days ago - Pushed at: 12 months ago - Stars: 1 - Forks: 0

burhanahmed1/Iris-Dataset-Analysis-with-PySpark
Implementation of K-means,Bisecting K-means and Decision Tree in PySpark on the Iris Dataset.
Language: Jupyter Notebook - Size: 146 KB - Last synced at: 4 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

VQHieu1012/Book-Recommendation-PySpark
Using PySpark Mlib and ALS model to create book recommendation
Language: Jupyter Notebook - Size: 58 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 1

Abhishake-Patel/Process-Data-Analytics
PySpark and Python ML and Data Science Projects on a variety of Topics
Language: Jupyter Notebook - Size: 27.3 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

OmarNouih/Twitter-Streams
Real-Time Sentiment Analysis on Twitter Streams is a web application that categorizes tweets into sentiments like Negative, Positive, Neutral, or Irrelevant. Built using Apache Kafka , Spark and PySpark ML models, it offers real-time analysis capabilities.
Language: Python - Size: 3.33 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

saahilk1511/Book-Recommender-System
The goal of this project was build recommender systems using K-means and ALS based on the average ratings. It recommends similar books, recommends author based on a book title, recommends high rated books of the author.
Language: Jupyter Notebook - Size: 414 KB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

mathewsrc/machine-learning-monitoring-with-evidently
ML Monitoring with EvidentlyAI
Language: Jupyter Notebook - Size: 23.1 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

naiborhujosua/Telco_Churn_Analysis
Implementing the Customer Churn Analysis in Telco Industry to improving Customer retention using Pyspark in Databricks
Size: 856 KB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

aakinlalu/Crime-Classification-using-PySpark
classify crime into different categories using PySpark
Language: Jupyter Notebook - Size: 311 KB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 19 - Forks: 14

Chan2k20/Wine-Prediction-Prediction-Model-On-AWS-EMR
Implemented random forest machine learning algorithm using pyspark on AWS EMR to classify the wines. The model is then deployed in docker container.
Language: Python - Size: 120 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 1

Lefteris-Souflas/Spark-Movies-Analytics
Utilizing Apache Spark & PySpark to analyze a movie dataset. Tasks include data exploration, identifying top-rated movies, training a linear regression model, and experimenting with Airflow.
Language: Jupyter Notebook - Size: 289 KB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

dimdasci/yp11-pyspark-training
Training project with Spark DataFrame and MLlib
Language: Jupyter Notebook - Size: 765 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

vt57299/Pyspark_Tutorial
Pyspark Tutorial on azure databricks
Language: Jupyter Notebook - Size: 18.6 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Safaa-p/Machine-Failure-Prediction
Predicting Machine failure using Machine learning on a synthetic dataset of an existing milling machine consisting of 10,000 data points
Language: Jupyter Notebook - Size: 4.7 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Prajwal10031999/Song-Genre-Classification-in-PySparks-MLlib
A PySpark MLlib classification model to classify songs based on a number of characteristics into a set of 23 electronic genres.
Language: Jupyter Notebook - Size: 1.56 MB - Last synced at: 8 days ago - Pushed at: over 4 years ago - Stars: 6 - Forks: 2

vijay06/Recommended_System-
Language: Jupyter Notebook - Size: 26.4 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

Sanjayvk98/Employee-Atrrition-PySpark-MLlib-
Machine Learning using Pyspark
Language: Jupyter Notebook - Size: 165 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

biagiom/spark-network-traffic-classifier
Network traffic classifier based on Apache Spark and MLlib
Language: Python - Size: 1.26 MB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 4 - Forks: 1

abroniewski/IdleCompute-Data-Management-Architecture
Implementation of a big data management and analysis backbone architecture using PySpark for distributed and scalable data ingestion and MLlib for machine learning analysis. Part of Big Data Management and Analytics (BDMA) program.
Language: Jupyter Notebook - Size: 34.8 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 1

AaronOS0/Bitcoin-Price-Prediction-PySpark
Bitcoin Price Prediction using Spark Global and self-designed Local Model with Big data preprocessing and manipulation solution.
Language: Jupyter Notebook - Size: 2.16 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 2

titicaca/spark-iforest
Isolation Forest on Spark
Language: Scala - Size: 74.2 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 221 - Forks: 91

MattithyahuData/P10-Bank-Note-Authentication
💷 Machine Learning PySpark Bank note authentication
Language: Jupyter Notebook - Size: 596 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

ghanmi-hamza/Machine-learning-with-PySpark
This notebook contains the usage of Pyspark to build machine learning classifiers (note that almost ml_algorithm supported by Pyspark are used in this notebook)
Language: Jupyter Notebook - Size: 109 KB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

ksashok/Movie-Recommendation-PySpark
Movie Recommendation using Apache Spark MLlib
Language: Python - Size: 1.95 KB - Last synced at: almost 2 years ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 1

yogeshwaran-shanmuganathan/Success-Prediction-Analysis-for-Startups
Analysis of information about startup companies done using machine learning and data analytics methods to predict the success of the startup companies.
Language: Jupyter Notebook - Size: 15.1 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

AntonioLunardi/NLP_Spark_sentiment_analisys
A bag of words analisys based on IMDB movie opinions with PySpark
Language: Jupyter Notebook - Size: 338 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

ninasiam/housing_data_analysis
A collection of small data science projects to predict house pricing for two different datasets
Language: Jupyter Notebook - Size: 4.42 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

imratnesh/pyspark
Pyspark, machine learning, python
Language: HTML - Size: 123 KB - Last synced at: about 1 month ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 1

ahkhaniki/spark-machine-learning
Language: Jupyter Notebook - Size: 18.6 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Abdelrahman13-coder/PySpark
Language: Jupyter Notebook - Size: 48 MB - Last synced at: 16 days ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

nisaharan/Medical-insurance-charges-prediction
Modelled the Medical insurance charges with the help of distributed computing platform Pyspark in Databricks. Used 2 models for this purpose. Linear Regression Logistic regression
Language: Jupyter Notebook - Size: 123 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

avimonda298/Spark-ML
Worked on diffrent Spark classification and regression algorithms
Language: Jupyter Notebook - Size: 1.02 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

aehabV/Cement-Strength-Prediction-with-PySpark
A machine learning model that predicts the strength of cement based on its ingredients using PySpark's MLlib library.
Language: Jupyter Notebook - Size: 24.4 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

aehabV/Indeed-fake-job-posting-prediction
A machine learning model is built using PySpark's MLlib library to automatically flag suspicious job postings on Indeed.com. The dataset includes 18,000 job descriptions, out of which about 800 are fake.
Language: Jupyter Notebook - Size: 25.9 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

manyuzhang1996/Consumer-Churn-Prediction-with-PySpark
Big Data Project
Language: Jupyter Notebook - Size: 4.86 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

JuanEnD/InsightPlaces
Projeto da 2º Edição do Challenge Data Science da Alura, onde utilizei o PySpark para analisar e tratar os dados dos preços de imóveis do Rio de Janeiro.
Language: Jupyter Notebook - Size: 25.8 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

lbdeoliveira/song-playlist-recommendation
This project was a joint effort by Lucas De Oliveira, Chandrish Ambati, and Anish Mukherjee to create a song and playlist embeddings for recommendations in a distributed fashion using a 1M playlist dataset by Spotify.
Language: HTML - Size: 225 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 32 - Forks: 12

yvgupta03/Big_Data_Project_US-Airlines_Tweet_Processing_and_Analysis
Big data application of Machine Learning concepts for sentiment classification of US Airlines tweets. The focus is on the usage of pyspark libraries (ml-lib) on big data to solve a problem using Machine Learning algorithms and not about the choice of algorithm used in the ML model creation. It also involves data pre-processing using NLP techniques, cross-validation and parameter-grid builder.
Language: Jupyter Notebook - Size: 1.83 MB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

sablokgaurav/data_engineering
java_codes
Language: Java - Size: 1.95 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Uriah372-DS/DDBMSPysparkProject
A course project with implementation of machine learning with spark structured streaming in python
Language: Jupyter Notebook - Size: 20.8 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

tharikf/PySpark_KingCounty
Language: Jupyter Notebook - Size: 13.7 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

AbdelmajidLh/ML_diabet_predict_pyspark
Prédiction du diabète par régression logistique avec Python et PySpark
Language: Jupyter Notebook - Size: 10.7 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

dunnkers/pyspark-bucketmap
Easily group pyspark data into buckets and map them to different values.
Language: Jupyter Notebook - Size: 56.6 KB - Last synced at: 17 days ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

mohanakrishnavh/PySpark-Tutorial
Language: Jupyter Notebook - Size: 2.87 MB - Last synced at: over 2 years ago - Pushed at: about 7 years ago - Stars: 17 - Forks: 19

AntonioLunardi/Challenge-Data-Science-Alura-2ed Fork of millenagena/Challenge-Data-Science-Alura-2ed
Projeto de engenharia e ciência de dados da imobiliária InsightPlaces utilizando tecnologias de big data. Implementação de modelos de machine learning de regressão e agrupamento (clusterização).
Language: Jupyter Notebook - Size: 26.4 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

animenon/pyspark_mllib
Example from Spark MLLib (in python)
Language: Python - Size: 23.4 KB - Last synced at: over 2 years ago - Pushed at: over 7 years ago - Stars: 9 - Forks: 6

aziz0519/sparkml-model-deployment
End-to-end prediction model development using PySpark with Docker and Streamlit
Language: Python - Size: 594 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

sujan-bala/Machine-Learning-Notebooks
Various Jupyter Notebooks containing ML projects
Language: Jupyter Notebook - Size: 19.3 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

iqrabismii/Big-Data-Projects-
Projects on Big Data Using Pyspark and AWS
Language: Jupyter Notebook - Size: 2.05 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

gabridego/spark-exercises
A collection of pyspark exercises
Language: Python - Size: 211 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 2

brunowdev/sparkify
This is the final project for the Data Scientist Nanodegree, where our goal is to predict churn for a fictional streaming service called Sparkify.
Language: HTML - Size: 6.33 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 0

Foroozani/BigData_PySpark
:bangbang: Handle Big Data for Machine Learning using Python and PySpark, Building ETL Pipelines with PySpark, MongoDB, and Bokeh
Language: Jupyter Notebook - Size: 35.1 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 6 - Forks: 4

hajarmerbouh/Pyspark_classification
Classification using Pyspark
Language: Jupyter Notebook - Size: 7.8 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

GabrielAraujoCarlos/Prevendo-Satisfacao-Cliente-Santander
Projeto de Machine Learning no Santander - Prevendo Nível de Satisfação do Cliente
Language: Jupyter Notebook - Size: 7.88 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

ravichoudharyds/Pyspark_Recommendation_System
Recommendation System using MLlib and ML libraries on Pyspark
Language: Python - Size: 137 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 4 - Forks: 1

alimunawar007/Network_Intrusion_Detection
Network Intrusion Detection using pyspark
Language: Jupyter Notebook - Size: 3.62 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

stevenlimcorn/australian-weather-prediction
A docker hosted australian weather prediction analysis with PySpark and Hadoop DFS
Language: Jupyter Notebook - Size: 12.3 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

matheusmmmp/MLlib-graphTracking
Tracking project with machine learning using pyspark mllib
Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

limz1986/PySpark-ML-Model-DataBricks
An introduction to PySpark, Creating a simple multi regression ML model and hosting it on a databricks cluster
Language: Python - Size: 7.81 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

matheusmmmp/MLlib-movieRecommendation
Movie recommendation project with machine learning using pyspark mllib
Language: Jupyter Notebook - Size: 921 KB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

toby-p/pyspark-flight-delay-prediction
Final project from "Machine Learning at Scale" (W261) in UC Berkeley's Data Science Masters program
Language: Jupyter Notebook - Size: 8.4 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

Reucherian/insure-health-scale
doing health insurance analytics and prediction at scale with pyspark.
Language: Jupyter Notebook - Size: 1.48 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

kabbina/Big-Data Fork of rohanmrb/Big-Data
Language: Python - Size: 29 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

yvgupta03/Big_Data_Project_Page-Ranking_Airports
PySpark code to implement page ranking algorithm on airports dataset to highlight the relative importance of the airports according to the dataset.
Language: Jupyter Notebook - Size: 365 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 1

apurva-modi/pyspark-twitter-sentimental-analysis
To Analyze how travelers expressed their feelings on Twitter using pyspark MLlib .Given tweets about six US airlines, the task is to predict whether a tweet contains positive, negative, or neutral sentiment about the airline. This is a typical supervised learning task where given a text string, I have to categorize the text string into predefined categories.
Language: Jupyter Notebook - Size: 406 KB - Last synced at: 3 months ago - Pushed at: about 5 years ago - Stars: 3 - Forks: 1

VirtualRoyalty/spark-nlp-project
Micro project on big data technologies via spark
Language: Jupyter Notebook - Size: 5.12 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 0

jpacerqueira-zz/Akamai-log-Analysis-SparkML-H2o
Transformation of Akamai Logs with Spark ETL and discover of Values and similarities in logs used SparkML and H2O ML
Language: HTML - Size: 106 MB - Last synced at: 6 months ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 1

yvgupta03/Big_Data_Extractive_Summarization
Big data final project - Encoder Decoder
Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

Rahul-Vasan/ML-driven-Personalized-Gourmet
Indecisive about what to eat? Want to try something new but not sure which one's to trust? Want to improve the dining experience of your patrons? - This is a one stop solution to handle all of these problems
Language: Jupyter Notebook - Size: 6.54 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 1

SotirisSotiriou/big-data-hadoop-spark
Assignment for UoM lesson "Big Data"
Language: Java - Size: 234 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

rishanki/correlation-matrix_Pyspark_RDD
Language: Jupyter Notebook - Size: 273 KB - Last synced at: over 2 years ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 2

steve303/spark_MLlib_graphf
Objectives: Using pyspark, MLlib and graphframes libraries, perform 1) classification and custering tasks using RandomF and Kmeans and 2) graph analysis tasks. This material is from UIUC MCS coursework.
Language: Python - Size: 23.4 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

twseptian/apache-pyspark-programming
Big Data Python Programming using Apache Spark and Pyspark
Language: Jupyter Notebook - Size: 78.1 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 5
