GitHub topics: spark-mllib
MHassaanButt/Crime-Spark-ML
In this project I stream data and do crime classification using Spark. This dataset contains incidents derived from the SFPD Crime Incident Reporting system. The data ranges from 1/1/2003 to 5/13/2015. I do some data analysis of crime scenes in different areas and with respect to other parameters.
Language: Python - Size: 5.86 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 0

spoddutur/spark-ml-dashboard
Spark ML Dashboard built to plug-in and tweak the model params to real-time verify classification results on sample test data
Language: Scala - Size: 20.5 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 5 - Forks: 2

jingpeicomp/product-relation-mining
商品关联关系挖掘,使用Spring Boot开发框架和Spark MLlib机器学习框架,通过FP-Growth算法,分析用户的购物车商品数据,挖掘商品之间的关联关系。项目对外提供RESTFul接口。
Language: Java - Size: 68.4 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 24 - Forks: 16

rishiravikumar-tul-scm/IPL-Analysis
IPL Match Simulation using K-means Clustering and Collaborative Filtering.
Language: Python - Size: 2.64 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

ZhipengHong0123/Steam-Game-Analysis
Language: HTML - Size: 3.68 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 4

samanta-anupam/similar-water-regions
In this project we look at the global surface water explorer and find patches of areas that are similar to each other in the entire world using the European Commision Global Surface satellite water dataset
Language: Jupyter Notebook - Size: 346 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

tweichle/Spark-for-Big-Data
Spark: Work with Big Data and Build Machine Learning Models at Scale
Language: Jupyter Notebook - Size: 63.5 KB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 1

tkachuksergiy/aws-spark-nlp
Works related to recent project on the use of Apache Spark and AWS cloud for NLP task.
Language: Jupyter Notebook - Size: 2.76 MB - Last synced at: almost 2 years ago - Pushed at: about 5 years ago - Stars: 5 - Forks: 0

grishenkovp/databricks
Коллекция кейсов на базе платформы Databricks
Language: Jupyter Notebook - Size: 504 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

MostafaToema/Stroke-Prediction-using-Pyspark
Data preparation, visualization, and feature engineering and classification of people have stroke using pyspark libraries
Language: Jupyter Notebook - Size: 79.1 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

nbegumc/market-basket-analysis
Finding frequent itemsets using Apriori and FP Growth algorithm on Spark
Language: Jupyter Notebook - Size: 692 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

DavideNardone/TwitterSentimentAnalysis
A Spark Streaming implementation for Online Twitter Sentiment Analysis.
Language: Python - Size: 1.78 MB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 8 - Forks: 3

josemarialuna/ClusterIndices
This package contains the code for executing clustering validity indices in Spark. The package includes BD-Silhouette, BD-Dunn, Davies-Bouldin and WSSSE indices.
Language: Scala - Size: 588 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 10 - Forks: 3

lp-dataninja/SparkML
Detailed notes and code to learn machine learning with Apache Spark.
Language: Jupyter Notebook - Size: 4.06 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 12 - Forks: 17

giuseppegambino/Italian-Sentiment-Analysis-with-Spark
Application of Sentiment Analysis of Italian tweet with Python and Spark
Language: Python - Size: 6.84 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 10 - Forks: 0

TrainingByPackt/Big-Data-Processing-with-Apache-Spark-eLearning
Efficiently tackle large datasets and perform big data analysis with Spark and Python
Language: Python - Size: 36.1 KB - Last synced at: 17 days ago - Pushed at: over 6 years ago - Stars: 7 - Forks: 6

OmarZOS/http-storage-service-mediator
This service is a component inside the petroleum production information system that I conceived and proposed.
Language: JavaScript - Size: 81.1 KB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

askmrsinh/spark-stocksim
Monte Carlo stock simulation using Apache Spark.
Language: Scala - Size: 1.81 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 2 - Forks: 2

demanejar/sparkml
Demo clustering with LDA Spark MLlib
Language: Python - Size: 831 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

kapilthakre/Bicycle-Sharing-Demand-Forecasting-Using-Spark-Scala
In this project, we are going to build a Bicycle sharing demand prediction service using Apache Spark and Scala. I have created a two spark application one for model generation and another for model demand prediction.
Language: Scala - Size: 295 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

merrillm1/Olist_Recommender_System
Recommendation engine with a .97 AUC achieved using clustering techniques to create user features. Data represents Olist marketplace transactions and was retrieved from kaggle.com.
Language: Jupyter Notebook - Size: 77.4 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 6

MostafaToema/Titanic-Survival-Prediction-using-Pyspark
Data preparation, visualization and feature engineering and classification of survival people using pyspark libraries
Language: Jupyter Notebook - Size: 43 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

MostafaToema/Wuzzuf-jobs
Wuzzuf data analysis, visualization and apply k-Mean algorithm using Spark-java
Language: Java - Size: 565 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

MostafaToema/PySpark-Practices
How to use Pyspark libraries with real data
Language: Jupyter Notebook - Size: 2.22 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

uosdmlab/spark-nkp
Natural Korean Processor for Apache Spark
Language: Scala - Size: 53.7 KB - Last synced at: 5 months ago - Pushed at: about 7 years ago - Stars: 53 - Forks: 16

yizhiru/mllibX
A customized version of mllib
Language: Scala - Size: 35.2 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

simbafl/spark-branch-2.4
源码剖析Spark2.4
Language: Scala - Size: 17.8 MB - Last synced at: 4 days ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 1

szaher/spark
Playing with Spark using Java
Language: Java - Size: 424 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

AlexKbit/titanic-sparkml
Sample with SparkML on Titanic dataset
Language: Scala - Size: 34.2 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

saLeox/Lambda-ServingAPI-HousePricePredict
Embeding the machine learning from spark-mllib into Springboot to provide house price prediction API
Language: Java - Size: 26.4 KB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

saLeox/Lambda-BatchTraining-HousePricePredict
Use Spark-Milib Java library to perform machine learning (regression problem)
Language: Jupyter Notebook - Size: 2.06 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

akashsethi24/Machine-Learning
Examples of all Machine Learning Algorithm in Apache Spark
Language: Scala - Size: 3.64 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 15 - Forks: 10

giovannigarifo/bigdata
Code samples, summaries, cheatsheets and other study material for Hadoop MapReduce and Apache Spark
Language: Java - Size: 69.1 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 2

dorianbg/cs110x-big-data-analysis-with-spark-labs
Graded lab exercises from the CS110x Big Data Analysis with Apache Spark online course on edx
Language: Jupyter Notebook - Size: 74.2 KB - Last synced at: about 2 years ago - Pushed at: over 8 years ago - Stars: 1 - Forks: 2

angeligareta/machine-learning-spark
Assignment for Scalable Machine Learning which aims to study the basics of regression and classification in Spark.
Language: Scala - Size: 1.42 MB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

angeligareta/spark-hadoop-hbase-overview
First lab for Data-Intensive Computing course at KTH where we are introduced to Apache Spark MLlib and Spark SQL, Hadoop, and HBase.
Language: Jupyter Notebook - Size: 22.4 MB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

Bcromas/pyspark_projects
A collection of small projects exploring PySpark features and functionality including packages and modules, algorithms, and general data science techniques.
Language: Jupyter Notebook - Size: 80.1 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

sunujh6/spark_practice
Language: Jupyter Notebook - Size: 1.62 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

tertiarycourses/ApacheSparkTraining
Exercise files for Apache Spark Essential Training
Language: Jupyter Notebook - Size: 4.05 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 1

Zoe-0925/PredictRain
A python program that predicts the rainfall using Spark MLlib machine learning algorithms.
Language: Jupyter Notebook - Size: 3.62 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 1

lowener/Spark-social-network-backend
Backend for a social network in Spark in Scala
Language: Scala - Size: 79.8 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

SainathDutkar/Fraud_Transaction_Monitor
For detecting the fraud credit card transactions at real time
Language: Scala - Size: 1000 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 4

joyoyoyoyoyo/emojipasta-topic-modeling
😅 A topic model of reddit.com/r/EmojiPasta trained with Spark and an LDA model (NSFW) - Trigger Warning: The r/emojipasta subreddit posts controversial content and anything I have crawled is to provide visibility of a topic modeling some of this controversial content. Unfortunately there is also discriminatory speech which must be called out!
Language: Scala - Size: 700 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 2 - Forks: 0

crazyalin92/movie_recomendation_system
Spark MLLIB: Collaborative Filtering Movie Recommendation System
Language: Scala - Size: 5.6 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 2 - Forks: 1

billyean/ztml
Implementation to coursera machine learning course, some tensor flow code.
Language: Python - Size: 305 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

shantanu-93/scalable-matrix-multiply
Fast and Scalable Matrix Multiply using spark, breeze and BLAS libraries
Language: JavaScript - Size: 127 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

vaibhav50596/DeerfootTrailAnalysis
The goal is to train a linear regression model to predict Deerfoot commute times given weather and accident conditions using Spark RDD and MLlib
Language: Jupyter Notebook - Size: 82 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 1

desaiankitb/spark-mllib
Apache Spark is one of the most widely used and supported open-source tools for machine learning and big data. In this repo, discover how to work with this powerful platform for machine learning. This repo discusses MLlib—the Spark machine learning library—which provides tools for data scientists and analysts who would rather find solutions to business problems than code, test, and maintain their own machine learning libraries. Repo shows how to use DataFrames to organize data structure, and covers data preparation and the most commonly used types of machine learning algorithms: clustering, classification, regression, and recommendations. You will have experience loading data into Spark, preprocessing data as needed to apply MLlib algorithms, and applying those algorithms to a variety of machine learning problems.
Language: Python - Size: 150 KB - Last synced at: almost 2 years ago - Pushed at: almost 7 years ago - Stars: 4 - Forks: 5

abhay6694/PySpark-Component
Collection of spark-components functions for big-data processing
Language: Jupyter Notebook - Size: 12.7 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

nahidalam/Spark
Spark, Python, AWS EMR, MLLib, Spark Streaming, Spark - SQL
Language: Jupyter Notebook - Size: 2.93 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 1

Aveek-Saha/Cricket-score-predictor
A Big data application to predict the outcome of a T20 cricket match.
Language: Jupyter Notebook - Size: 2.17 MB - Last synced at: 17 days ago - Pushed at: about 6 years ago - Stars: 2 - Forks: 0

DerrickBu/Movie_Recommendation_Application
This is a web-based movie recommendation application written in Scala using Apache Spark and Livy.
Language: Scala - Size: 17.6 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 2

jeanlks/sparkCourse
Spark Course notebooks.
Language: Jupyter Notebook - Size: 714 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

BinhMinhs10/DataMiningExample
Maven project cover scala language: sparkml, spark_streaming, spark_dataframe, ... + java language: threadpool, kafka, jpa, timer, request api
Language: Scala - Size: 1.32 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 1

trendyol-data-eng-summer-intern-2019/recom-engine-ml
ML component of the project, which is written with Spark ML.
Language: Java - Size: 12.7 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 1

nath2709/spark-ml
Language: Scala - Size: 558 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

maniram-yadav/Spark_And_Scala_codes
Language: Scala - Size: 498 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 1

AndrewKuzmin/spark-ml-pipelines-with-structured-streaming-examples
Examples of using Apache Spark MLlib Pipelines and Structured Streaming on version 2.4.0
Language: Shell - Size: 1020 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

hichem/spark-training
Language: Jupyter Notebook - Size: 1.93 MB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

AswathKiruba/Stock_Price_Prediction
This is the CSYE7200 Big Data Systems Engineering Using Scala Final Project for Team 9 Fall 2018
Language: Scala - Size: 3.48 MB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 1

corneliouzbett/Master-Apache-Spark
Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming
Language: Python - Size: 889 KB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 0

multivacplatform/multivac-nlp
Testing and benchmarking some of the existing NLP libraries in Apache Spark
Language: Scala - Size: 12 MB - Last synced at: about 2 months ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 1

suyash248/recommender_system
Recommendation system using Graph DB(Neo4j), Apache Spark & Machine learning.
Language: Jupyter Notebook - Size: 17.4 MB - Last synced at: 3 months ago - Pushed at: about 7 years ago - Stars: 3 - Forks: 1

rtahmasbi/Spark
Size: 18.6 KB - Last synced at: 2 months ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 1

AliAminian/server-response-time-predictor
This is an example to show how Spark ML could be used to predict response time of a service for a server-side application
Language: Java - Size: 243 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

markusheilig/san-francisco-calls-for-service
Apache Spark mllib example for seminar 'AI with scala'
Language: CSS - Size: 21.6 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

lp-dataninja/PyData
Code and Data for PyData-Hyderabad-Chapter meetup
Language: HTML - Size: 1.24 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 3

nishantgandhi99/Santander-Product-Recommendation Fork of vaishalilambe/Team7_Santander_Product_Recommendation
Santander Product Recommendation for Santander Customer Dataset / Kaggle Competition
Language: Scala - Size: 8.68 MB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 1

memojja/recomendation-engine
Language: Java - Size: 35.7 MB - Last synced at: about 1 month ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0

vinfly/vinSparkMLlib
MLlib samples
Language: Scala - Size: 18.6 KB - Last synced at: almost 2 years ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0

irfanalidv/Breast_Cancer_Prediction_using_Apache_Spark
Predict whether the cancer is benign or malignant
Language: Jupyter Notebook - Size: 7.81 KB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

happylittlebunny/Yelp-User-Pattern-And-Recommender-System
Yelp Toronto User Pattern Analysis and Recommender System
Language: Python - Size: 104 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

crazyalin92/spark-logistic-regression
Example of applying logistic regression to predict diabet of patients
Language: Scala - Size: 10.7 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 3
