GitHub topics: mllib
hendhamdi/Sentiment-Analysis-Spark-NLP
This project uses a Spark pipeline (PySpark) to analyze the sentiment of user reviews.
Language: Python - Size: 62.5 KB - Last synced at: about 11 hours ago - Pushed at: about 12 hours ago - Stars: 0 - Forks: 0

ego-creator/hepmassClassification
Pipeline PySpark pour la classification de particules en physique des hautes énergies (dataset HEPMASS). Inclut le prétraitement distribué, l'entraînement de modèles (régression logistique, arbres de décision), l'évaluation et des visualisations clés. Optimisé pour Hadoop/Spark.
Size: 1.95 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

hexnn/Stark
基于Spark+SparkMLlib+Debezium打造的简单易用、超高性能大数据治理引擎,适用于批流一体的数据集成和数据分析,支持机器学习算法模型、支持CDC实时数据采集,支持数据质量校验、数据建模、算法建模和OLAP数据分析
Language: Scala - Size: 229 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 27 - Forks: 1

kriss024/Spark
Spark for Data Science and ETL process.
Language: Jupyter Notebook - Size: 78 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 2 - Forks: 0

databricks/LearningSparkV2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Language: Scala - Size: 75.2 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 1,279 - Forks: 764

Dirkster99/AvalonDock
Our own development branch of the well known WPF document docking library
Language: C# - Size: 3.08 MB - Last synced at: 16 days ago - Pushed at: 10 months ago - Stars: 1,496 - Forks: 328

jadianes/spark-py-notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Language: Jupyter Notebook - Size: 2.2 MB - Last synced at: 12 days ago - Pushed at: about 1 year ago - Stars: 1,646 - Forks: 917

nthaihoc/rfm-segmentation-ml
An automatic machine learning based customer segmentation model with RFM analysis at ICTA conference 2024
Language: Python - Size: 221 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

Java-Edge/Spark-MLlib-Tutorial
大数据框架 Spark MLlib 机器学习库基础算法全面讲解,附带齐全的测试文件
Language: Scala - Size: 3.58 MB - Last synced at: 22 days ago - Pushed at: about 1 year ago - Stars: 39 - Forks: 31

abeermohamed1/Recommender-System
Implementation of Inferring Networks of Substitutable and Complementary Products Model paper
Language: Python - Size: 1.54 MB - Last synced at: 2 months ago - Pushed at: about 6 years ago - Stars: 15 - Forks: 3

yChaaby/Real-Time-CourseCompass
A real-time course recommendation system powered by Apache Spark and Kafka for scalable big data processing. It uses content-based filtering and AI-generated keywords to deliver personalized learning suggestions, all orchestrated with Docker for seamless deployment.
Language: Jupyter Notebook - Size: 6.66 MB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

SnehaDharne/BigDataAnalytics-MVCollisions
Leveraging NYC Open Data, this repository contains Databricks notebooks for analyzing motor vehicle collisions. We perform EDA, spatial clustering, and predictive modeling on collision, vehicle, and person datasets to understand accident trends and predict potential risks.
Language: Jupyter Notebook - Size: 7.64 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

chouaib-629/hepmassClassification
Pipeline PySpark pour la classification de particules en physique des hautes énergies (dataset HEPMASS). Inclut le prétraitement distribué, l'entraînement de modèles (régression logistique, arbres de décision), l'évaluation et des visualisations clés. Optimisé pour Hadoop/Spark.
Language: Shell - Size: 160 KB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

agoda-com/spark-hpopt
Bayesian hyperparamter tuning for Spark MLLib
Language: Jupyter Notebook - Size: 711 KB - Last synced at: 18 days ago - Pushed at: about 4 years ago - Stars: 11 - Forks: 1

tomaztk/Azure-Databricks
Azure Databricks - Advent of 2020 Blogposts
Language: Jupyter Notebook - Size: 44.9 MB - Last synced at: 22 days ago - Pushed at: over 2 years ago - Stars: 60 - Forks: 49

lupusruber/RNMP_homework2
A recommendation system project that uses the Spark MLlib's ALS model to train and evaluate on the MovieLens dataset. Includes Dockerized setup, hyperparameter tuning, and evaluation metrics (RMSE, Precision@K, Recall@K, NDCG) for performance insights.
Language: Python - Size: 66.4 KB - Last synced at: 22 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Parag000/NYC-Taxi-Fare-Prediction-Using-Pyspark
Processed 13.2M records, conducted EDA, feature engineering, and built Linear regression model for fare prediction. Tackled big data challenges with efficient preprocessing and visualizations
Language: Jupyter Notebook - Size: 184 KB - Last synced at: 27 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

pathak-ashutosh/spark-movie-recommendation
A movie recommendation system on MovieLens 25M dataset using Python and Apache Spark
Language: Python - Size: 19.5 KB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

aiwithqasim/pyspark_bigdata
Getting started with PySpark for Big data analysis
Language: Jupyter Notebook - Size: 835 KB - Last synced at: 23 days ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 12

amirhosseinazami1373/Book-Genre-Classifier
Have you ever tried to guess the genre of a book by reading its title? Well, in this project, I was trying to do it using a massive database of Books (their titles and genres), MLLib Spark, and the use of three different ML models, including: 1- Support Vector Machine (SVM) 2- Logistic Regression 3- Neural Networks
Language: HTML - Size: 44.9 KB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

AishwaryaHastak/IPL_Analysis
Analysis of IPL dataset using PySpark
Language: Jupyter Notebook - Size: 2.78 MB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

Minishlink/MLlib
[2009] Code apps and games easily on Nintendo Wii !
Language: C - Size: 311 KB - Last synced at: 18 days ago - Pushed at: over 7 years ago - Stars: 11 - Forks: 0

flipkart-incubator/spark-transformers
Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.
Language: Java - Size: 609 KB - Last synced at: 17 days ago - Pushed at: over 7 years ago - Stars: 40 - Forks: 29

ckongala/SparkPythonBigData
Big-Data with Apache Spark and Python.
Language: Python - Size: 169 KB - Last synced at: 2 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Ssharma91/Churn-Analysis-
Statistical Modelling / Tree based algorithms : Churn Analysis in retail industry using Python and PySpark
Language: Jupyter Notebook - Size: 8.79 KB - Last synced at: 10 months ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

LucaSpadoni/Scala-Spark-Stellar-Classification
Classification of astronomical objects using Scala-Spark and its ML library "spark.ml", based on the Stellar Classification Dataset (SDSS17).
Language: Scala - Size: 7.77 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

MelinaMoraiti/Spark-Text-Analyzer
An Apache Spark application to analyze word frequencies and compute TF-IDF weights across multiple text file sets using Spark's MLlib library.
Language: Scala - Size: 17.6 KB - Last synced at: about 2 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

santiagxf/portable-sparkml
This repository shows how to create containerized versions of models trained with spark MLLib
Language: Jupyter Notebook - Size: 5.86 KB - Last synced at: 2 days ago - Pushed at: about 5 years ago - Stars: 2 - Forks: 2

sprcoder/WineQualityModel
Using MLLib in Spark to train a ML model for wine quality prediction.
Language: Python - Size: 181 KB - Last synced at: 11 months ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

nikitap492/PokemonSparkML
Is a pokemon legendary? This project tries to find out it with spark mllib
Language: Scala - Size: 18.6 KB - Last synced at: 11 months ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

muhammad-ahsan/spark-toolbox
Spark based applications to perform big data analytics
Language: Python - Size: 40 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

jabhij/CrimeRate_Classification
Developing a system that could classify crime descriptions into different categories which would help the authorities to assign officers to crimes based on the report.
Language: Jupyter Notebook - Size: 30.2 MB - Last synced at: 18 days ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 2

jabhij/Apple_DataAnalyis_ApacheSpark
Analyzed Apple's dataset to check how many people bought Airpods after buying Mac or iPhone. Thereafter, using ML and predictive analytics to check future outcomes.
Size: 0 Bytes - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

aikuyun/spark-all
Spark core sql streaming mllib
Language: Scala - Size: 526 KB - Last synced at: 22 days ago - Pushed at: about 6 years ago - Stars: 4 - Forks: 0

JonathanLoscalzo/pyspark_mllib-bigdata-unlp
Pyspark & Mllib final exam of Big Data Course, UNLP
Language: Jupyter Notebook - Size: 173 KB - Last synced at: 12 months ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

naiborhujosua/Data-Scientist-learning-path-using-databricks
This is the summary of learning Data Science using Databricks
Size: 51.8 KB - Last synced at: 12 months ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

saikumarsuvanam/BigData
Hadoop,MachineLearningAlgos,Spark,Pig,Hive
Language: Java - Size: 4.37 MB - Last synced at: 12 months ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 1

deepjyotiroy079/bike-sharing-demand
Service that combines historical usage patterns with weather data to forecast the bicycle rental demand in real time.
Language: Jupyter Notebook - Size: 3.2 MB - Last synced at: 12 months ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

sudarshan-koirala/spark-practice
Learning spark the right way
Language: Jupyter Notebook - Size: 46.9 KB - Last synced at: 12 months ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

mkbehbehani/spark-advanced-regression-kaggle
Prediction system for Kaggle's advanced regression competition using Scala + Spark
Language: Scala - Size: 16.8 MB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 1

Gatmatz/Survey-Distributed-Machine-Learning
This is a bibliography survey upon Distributed Machine Learning. The survey contains algorithmic selections and architectures that can facilitate distributed learning on ML models. There is also a part that presents MLlib, a ML library from Apache Spark for distributed ML implementations.
Size: 556 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

amy-panda/NY_Taxi_Data_Analysis_and_Modelling
Analysing the taxi trips in New York City and predicting total fare amount of taxi trips
Language: Jupyter Notebook - Size: 1.84 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

geektoni/twitter_user_profiling
Clustering of tweets in order to provide users profiles using Spark MLlib.
Language: Python - Size: 945 KB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

zydusss/Spark
Data Analytics using Spark
Language: Jupyter Notebook - Size: 33.2 KB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 0

mohigup/spark-ml-prediction
CS6240 Data Mining Project
Language: Scala - Size: 8.79 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

Lakshmiaddepalli/BigDataProject
CSCI-GA.3033-005 - Big Data Application Development
Language: Python - Size: 41.4 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

jubins/Spark-And-MLlib-Projects
This repository contains Spark, MLlib, PySpark and Dataframes projects
Language: Jupyter Notebook - Size: 101 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 39 - Forks: 97

ognis1205/spark-tda
SparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.
Language: Scala - Size: 31.4 MB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 47 - Forks: 5

iamchetanks/Handwritten-Digit-Recognition-using-Spark-Scala
Google DataProc Spark Scala Job for MNIST Handwritten Digit Recognition using Decision Trees (Spark MLlib)
Language: Perl 6 - Size: 24.8 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

mjaglan/bigdata-example-project
Clustering with Spark on OpenStack Cloud. [OpenStack] [Hadoop] [Spark] [Ansible] [YAML] [Scala] [SBT] [Shell]
Language: Scala - Size: 30.2 MB - Last synced at: over 1 year ago - Pushed at: almost 9 years ago - Stars: 2 - Forks: 3

annagracia12/MassiveDataProcessing
Projects of the subject Massive Data Processing Engineering at Universidad Internacional de La Rioja.
Language: Jupyter Notebook - Size: 3.73 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

NikosVlachakis/Information-Systems-Analysis-and-Design
Semester project for the course "Information Systems Analysis and Design" at ECE-NTUA in 2022.
Language: TeX - Size: 730 KB - Last synced at: 12 months ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

vardan10/Technothon
Compares Execution time for MapReduce, Hive and spark. Plus do sentimental analysis on any Youtube video
Language: Python - Size: 14.1 MB - Last synced at: 11 months ago - Pushed at: over 7 years ago - Stars: 3 - Forks: 0

Heisenberghj7/Retail-Store-BigData
📊 📑This project provides a step-by-step big data analytics applied in the retail industry through the use of a variety of big data technologies. such as HDFS, Hive and Spark..
Language: Python - Size: 2.11 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

xavierguihot/mllib_decision_tree_reducer
Small facility which reduces naive decision tree models as produced by mllib
Language: Scala - Size: 291 KB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0

Lewuathe/dllib
dllib is a distributed deep learning library running on Apache Spark
Language: CSS - Size: 3.69 MB - Last synced at: 19 days ago - Pushed at: over 7 years ago - Stars: 32 - Forks: 5

30lm32/ml-random-forest-pyspark
Random Forest Binary Classification is applying on sample data in PySpark on Jupyter Notebook
Language: Jupyter Notebook - Size: 44.9 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 3 - Forks: 3

djgarcia/PCARD
PCARD Ensemble classifier for Big Data
Language: Scala - Size: 19.5 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 4 - Forks: 2

AdautoDCJunior/spark-processamento-linguagem-natural
Repositório do curso "Spark: processamento de linguagem natural" da Alura.
Language: Jupyter Notebook - Size: 536 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

AdautoDCJunior/spark-criando-modelos-classificacao
Repositório do curso "Spark: criando modelos de classificação" da Alura.
Language: Jupyter Notebook - Size: 160 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

lgallen/pyspark_dataframes
Tutorial on how to use the Python API for Spark dataframes.
Language: HTML - Size: 81.1 KB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 2

Denis-Mukhanov/Chicago_taxi_trips_BigData
Language: Jupyter Notebook - Size: 1 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

rdempsey/data-analytics-machine-learning-big-data
Slides, code and more for my class: Data Analytics and Machine Learning on Big Data
Language: Jupyter Notebook - Size: 127 MB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 8 - Forks: 21

emrekutlug/getting-started-with-pyspark
In this tutorial, I explained SparkContext by using map and filter methods with Lambda functions in Python and created RDD from object and external files, transformations and actions on RDD and pair RDD, PySpark DataFrame from RDD and external files, used sql queries with DataFrames by using Spark SQL, used machine learning with PySpark MLlib.
Language: Jupyter Notebook - Size: 8.58 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 5 - Forks: 6

sadikovi/spark-hosvd
Spark High Order SVD
Language: Scala - Size: 61.5 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 1

pausanchezv/Bases-de-dades-II-Big-Data
Assignatura Bases de Dades Avançades d'Enginyeria Informàtica (Universitatd de Barcelona)
Language: Java - Size: 106 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

AntonioLunardi/NLP_Spark_sentiment_analisys
A bag of words analisys based on IMDB movie opinions with PySpark
Language: Jupyter Notebook - Size: 338 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

vsmolyakov/pyspark
spark (scala and python)
Language: Python - Size: 2.4 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 17 - Forks: 6

cjr227/cf_spark_mllib
CF based Music Recommendation using Spark/MLLib
Language: Jupyter Notebook - Size: 7.81 KB - Last synced at: over 1 year ago - Pushed at: almost 8 years ago - Stars: 0 - Forks: 0

tankkyo/AuraAnalysis
Language: Scala - Size: 1.27 MB - Last synced at: over 1 year ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 1

javiaspiroz/ml-spark-anime
Sistema recomendador usando un dataset similar al de Movielens pero usando CSVs de animes y utilizando Apache Spark en la ejecución del algoritmo ALS.
Language: Python - Size: 7.31 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

kuptservol/jml
Yet Another Java ML Lib
Language: Java - Size: 2.44 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 0

Dheeraj2444/spark
Learning PySpark
Language: Jupyter Notebook - Size: 18.6 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 1

SalmaOuardi/Classification-with-PySpark
Discover classification through PySpark with MLlib + Feature Extraction
Language: Jupyter Notebook - Size: 7.81 KB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

imgoodman/kaggle-spark-ml
kaggle machine learning with spark
Language: Python - Size: 113 KB - Last synced at: about 1 year ago - Pushed at: almost 8 years ago - Stars: 7 - Forks: 2

Nanobelka/california-housing
PySpark pipeline for median house value prediction
Language: Jupyter Notebook - Size: 2.12 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

notPlancha/pbd
trabalho de pbd
Language: Jupyter Notebook - Size: 8.22 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

ironavt/spark-california-housing
Predicting median house value in California using big data tools
Language: Jupyter Notebook - Size: 15.6 KB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

krzysztofhewelt/Smart-applications-Python
Smart applications written in Python, using Pyspark and MLlib.
Language: Jupyter Notebook - Size: 1.5 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Anas436/Fundamentals-of-Scalable-Data-Science
Language: Jupyter Notebook - Size: 3.13 MB - Last synced at: 30 days ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

knoldus/Play-Spark-Scala
Language: Scala - Size: 236 KB - Last synced at: about 2 years ago - Pushed at: over 9 years ago - Stars: 52 - Forks: 40

knoldus/Movie-Recommendation-Engine
Recommends movies based on your interests
Language: Scala - Size: 6.84 KB - Last synced at: about 2 years ago - Pushed at: almost 9 years ago - Stars: 1 - Forks: 0

animenon/pyspark_mllib
Example from Spark MLLib (in python)
Language: Python - Size: 23.4 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 9 - Forks: 6

anandsharma-i/Parkinsons_Disease_Detection
This is a project about detection of Parkinson's disease based on the particular features/traits shown by patients.
Language: Jupyter Notebook - Size: 6.84 KB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

sramirez/spark-RELIEFFC-fselection
Distributed version of RELIEF-F algorithm for Apache Spark.
Language: Scala - Size: 15.2 MB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 3 - Forks: 2

Anas436/Getting-Started-with-PySpark
Language: Jupyter Notebook - Size: 1.8 MB - Last synced at: 30 days ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

pierrenodet/spark-smile 📦
Integrating SMILE and Spark
Language: Scala - Size: 341 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 0

data-integrations/logistic-regression-analytics 📦
Logistic Regression Predictor and Classifier Plugins
Language: Java - Size: 34.2 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 2

data-integrations/ngram-analytics 📦
NGram Analytics Transform Plugin: Transforms input features into n-grams
Language: Java - Size: 50.8 KB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 2

data-integrations/hashing-tf-feature-generator 📦
Hashing TF Feature Generator Plugin
Language: Java - Size: 31.3 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 2

anujdutt9/BigData-and-Machine-Learning
Basics of Big Data and Machine Learning using Apache Spark and Scala
Language: Scala - Size: 834 KB - Last synced at: 18 days ago - Pushed at: about 8 years ago - Stars: 7 - Forks: 9

alessandroiori/community-detection-lastfm
Python3, NetworkX, Java, MLlib, Spark, Cassandra, Neo4j 3.0, Gephi, Docker
Language: Roff - Size: 60.6 MB - Last synced at: about 2 years ago - Pushed at: almost 8 years ago - Stars: 11 - Forks: 2

data-integrations/tokenizer-analytics 📦
Tokenizer Analytics Plugin: A transform plugin to split data based on a pattern
Language: Java - Size: 37.1 KB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 2

data-integrations/skipgram-analytics 📦
SkipGram Feature Generator: Generates text-based features using Spark's Word2Vec.
Language: Java - Size: 37.1 KB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 1

data-integrations/decision-tree-analytics 📦
CDAP plugins for training a model using decision tree and for predicting outcomes based on the trained model.
Language: Java - Size: 67.4 KB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 1

tanersekmen/pyspark-mllib
mllib with pyspark
Language: Jupyter Notebook - Size: 24.4 KB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

colbyford/sparkitecture
A collection of “cookbook-style” scripts for simplifying data engineering and machine learning in Apache Spark.
Size: 665 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 11 - Forks: 4

shaharpit809/Latent-Dirichlet-allocation-LDA-on-YELP-dataset-using-Apache-Spark
This repository consists of comparison between two LDA algorithms (EM and Online) in Apache Spark 'mllib' library and also finding the best hyper parameters on YELP dataset.
Language: Java - Size: 6.43 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 1

matheusmmmp/MLlib-graphTracking
Tracking project with machine learning using pyspark mllib
Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

BigDataProcSystems/Practice
Language: Jupyter Notebook - Size: 10.1 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 2
