An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: mllib

hendhamdi/Sentiment-Analysis-Spark-NLP

This project uses a Spark pipeline (PySpark) to analyze the sentiment of user reviews.

Language: Python - Size: 62.5 KB - Last synced at: about 11 hours ago - Pushed at: about 12 hours ago - Stars: 0 - Forks: 0

ego-creator/hepmassClassification

Pipeline PySpark pour la classification de particules en physique des hautes énergies (dataset HEPMASS). Inclut le prétraitement distribué, l'entraînement de modèles (régression logistique, arbres de décision), l'évaluation et des visualisations clés. Optimisé pour Hadoop/Spark.

Size: 1.95 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

hexnn/Stark

基于Spark+SparkMLlib+Debezium打造的简单易用、超高性能大数据治理引擎,适用于批流一体的数据集成和数据分析,支持机器学习算法模型、支持CDC实时数据采集,支持数据质量校验、数据建模、算法建模和OLAP数据分析

Language: Scala - Size: 229 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 27 - Forks: 1

kriss024/Spark

Spark for Data Science and ETL process.

Language: Jupyter Notebook - Size: 78 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 2 - Forks: 0

databricks/LearningSparkV2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

Language: Scala - Size: 75.2 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 1,279 - Forks: 764

Dirkster99/AvalonDock

Our own development branch of the well known WPF document docking library

Language: C# - Size: 3.08 MB - Last synced at: 16 days ago - Pushed at: 10 months ago - Stars: 1,496 - Forks: 328

jadianes/spark-py-notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Language: Jupyter Notebook - Size: 2.2 MB - Last synced at: 12 days ago - Pushed at: about 1 year ago - Stars: 1,646 - Forks: 917

nthaihoc/rfm-segmentation-ml

An automatic machine learning based customer segmentation model with RFM analysis at ICTA conference 2024

Language: Python - Size: 221 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

Java-Edge/Spark-MLlib-Tutorial

大数据框架 Spark MLlib 机器学习库基础算法全面讲解,附带齐全的测试文件

Language: Scala - Size: 3.58 MB - Last synced at: 22 days ago - Pushed at: about 1 year ago - Stars: 39 - Forks: 31

abeermohamed1/Recommender-System

Implementation of Inferring Networks of Substitutable and Complementary Products Model paper

Language: Python - Size: 1.54 MB - Last synced at: 2 months ago - Pushed at: about 6 years ago - Stars: 15 - Forks: 3

yChaaby/Real-Time-CourseCompass

A real-time course recommendation system powered by Apache Spark and Kafka for scalable big data processing. It uses content-based filtering and AI-generated keywords to deliver personalized learning suggestions, all orchestrated with Docker for seamless deployment.

Language: Jupyter Notebook - Size: 6.66 MB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

SnehaDharne/BigDataAnalytics-MVCollisions

Leveraging NYC Open Data, this repository contains Databricks notebooks for analyzing motor vehicle collisions. We perform EDA, spatial clustering, and predictive modeling on collision, vehicle, and person datasets to understand accident trends and predict potential risks.

Language: Jupyter Notebook - Size: 7.64 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

chouaib-629/hepmassClassification

Pipeline PySpark pour la classification de particules en physique des hautes énergies (dataset HEPMASS). Inclut le prétraitement distribué, l'entraînement de modèles (régression logistique, arbres de décision), l'évaluation et des visualisations clés. Optimisé pour Hadoop/Spark.

Language: Shell - Size: 160 KB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

agoda-com/spark-hpopt

Bayesian hyperparamter tuning for Spark MLLib

Language: Jupyter Notebook - Size: 711 KB - Last synced at: 18 days ago - Pushed at: about 4 years ago - Stars: 11 - Forks: 1

tomaztk/Azure-Databricks

Azure Databricks - Advent of 2020 Blogposts

Language: Jupyter Notebook - Size: 44.9 MB - Last synced at: 22 days ago - Pushed at: over 2 years ago - Stars: 60 - Forks: 49

lupusruber/RNMP_homework2

A recommendation system project that uses the Spark MLlib's ALS model to train and evaluate on the MovieLens dataset. Includes Dockerized setup, hyperparameter tuning, and evaluation metrics (RMSE, Precision@K, Recall@K, NDCG) for performance insights.

Language: Python - Size: 66.4 KB - Last synced at: 22 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Parag000/NYC-Taxi-Fare-Prediction-Using-Pyspark

Processed 13.2M records, conducted EDA, feature engineering, and built Linear regression model for fare prediction. Tackled big data challenges with efficient preprocessing and visualizations

Language: Jupyter Notebook - Size: 184 KB - Last synced at: 27 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

pathak-ashutosh/spark-movie-recommendation

A movie recommendation system on MovieLens 25M dataset using Python and Apache Spark

Language: Python - Size: 19.5 KB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

aiwithqasim/pyspark_bigdata

Getting started with PySpark for Big data analysis

Language: Jupyter Notebook - Size: 835 KB - Last synced at: 23 days ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 12

amirhosseinazami1373/Book-Genre-Classifier

Have you ever tried to guess the genre of a book by reading its title? Well, in this project, I was trying to do it using a massive database of Books (their titles and genres), MLLib Spark, and the use of three different ML models, including: 1- Support Vector Machine (SVM) 2- Logistic Regression 3- Neural Networks

Language: HTML - Size: 44.9 KB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

AishwaryaHastak/IPL_Analysis

Analysis of IPL dataset using PySpark

Language: Jupyter Notebook - Size: 2.78 MB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

Minishlink/MLlib

[2009] Code apps and games easily on Nintendo Wii !

Language: C - Size: 311 KB - Last synced at: 18 days ago - Pushed at: over 7 years ago - Stars: 11 - Forks: 0

flipkart-incubator/spark-transformers

Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.

Language: Java - Size: 609 KB - Last synced at: 17 days ago - Pushed at: over 7 years ago - Stars: 40 - Forks: 29

ckongala/SparkPythonBigData

Big-Data with Apache Spark and Python.

Language: Python - Size: 169 KB - Last synced at: 2 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Ssharma91/Churn-Analysis-

Statistical Modelling / Tree based algorithms : Churn Analysis in retail industry using Python and PySpark

Language: Jupyter Notebook - Size: 8.79 KB - Last synced at: 10 months ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

LucaSpadoni/Scala-Spark-Stellar-Classification

Classification of astronomical objects using Scala-Spark and its ML library "spark.ml", based on the Stellar Classification Dataset (SDSS17).

Language: Scala - Size: 7.77 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

MelinaMoraiti/Spark-Text-Analyzer

An Apache Spark application to analyze word frequencies and compute TF-IDF weights across multiple text file sets using Spark's MLlib library.

Language: Scala - Size: 17.6 KB - Last synced at: about 2 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

santiagxf/portable-sparkml

This repository shows how to create containerized versions of models trained with spark MLLib

Language: Jupyter Notebook - Size: 5.86 KB - Last synced at: 2 days ago - Pushed at: about 5 years ago - Stars: 2 - Forks: 2

sprcoder/WineQualityModel

Using MLLib in Spark to train a ML model for wine quality prediction.

Language: Python - Size: 181 KB - Last synced at: 11 months ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

nikitap492/PokemonSparkML

Is a pokemon legendary? This project tries to find out it with spark mllib

Language: Scala - Size: 18.6 KB - Last synced at: 11 months ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

muhammad-ahsan/spark-toolbox

Spark based applications to perform big data analytics

Language: Python - Size: 40 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

jabhij/CrimeRate_Classification

Developing a system that could classify crime descriptions into different categories which would help the authorities to assign officers to crimes based on the report.

Language: Jupyter Notebook - Size: 30.2 MB - Last synced at: 18 days ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 2

jabhij/Apple_DataAnalyis_ApacheSpark

Analyzed Apple's dataset to check how many people bought Airpods after buying Mac or iPhone. Thereafter, using ML and predictive analytics to check future outcomes.

Size: 0 Bytes - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

aikuyun/spark-all

Spark core sql streaming mllib

Language: Scala - Size: 526 KB - Last synced at: 22 days ago - Pushed at: about 6 years ago - Stars: 4 - Forks: 0

JonathanLoscalzo/pyspark_mllib-bigdata-unlp

Pyspark & Mllib final exam of Big Data Course, UNLP

Language: Jupyter Notebook - Size: 173 KB - Last synced at: 12 months ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

naiborhujosua/Data-Scientist-learning-path-using-databricks

This is the summary of learning Data Science using Databricks

Size: 51.8 KB - Last synced at: 12 months ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

saikumarsuvanam/BigData

Hadoop,MachineLearningAlgos,Spark,Pig,Hive

Language: Java - Size: 4.37 MB - Last synced at: 12 months ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 1

deepjyotiroy079/bike-sharing-demand

Service that combines historical usage patterns with weather data to forecast the bicycle rental demand in real time.

Language: Jupyter Notebook - Size: 3.2 MB - Last synced at: 12 months ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

sudarshan-koirala/spark-practice

Learning spark the right way

Language: Jupyter Notebook - Size: 46.9 KB - Last synced at: 12 months ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

mkbehbehani/spark-advanced-regression-kaggle

Prediction system for Kaggle's advanced regression competition using Scala + Spark

Language: Scala - Size: 16.8 MB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 1

Gatmatz/Survey-Distributed-Machine-Learning

This is a bibliography survey upon Distributed Machine Learning. The survey contains algorithmic selections and architectures that can facilitate distributed learning on ML models. There is also a part that presents MLlib, a ML library from Apache Spark for distributed ML implementations.

Size: 556 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

amy-panda/NY_Taxi_Data_Analysis_and_Modelling

Analysing the taxi trips in New York City and predicting total fare amount of taxi trips

Language: Jupyter Notebook - Size: 1.84 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

geektoni/twitter_user_profiling

Clustering of tweets in order to provide users profiles using Spark MLlib.

Language: Python - Size: 945 KB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

zydusss/Spark

Data Analytics using Spark

Language: Jupyter Notebook - Size: 33.2 KB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 0

mohigup/spark-ml-prediction

CS6240 Data Mining Project

Language: Scala - Size: 8.79 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

Lakshmiaddepalli/BigDataProject

CSCI-GA.3033-005 - Big Data Application Development

Language: Python - Size: 41.4 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

jubins/Spark-And-MLlib-Projects

This repository contains Spark, MLlib, PySpark and Dataframes projects

Language: Jupyter Notebook - Size: 101 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 39 - Forks: 97

ognis1205/spark-tda

SparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.

Language: Scala - Size: 31.4 MB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 47 - Forks: 5

iamchetanks/Handwritten-Digit-Recognition-using-Spark-Scala

Google DataProc Spark Scala Job for MNIST Handwritten Digit Recognition using Decision Trees (Spark MLlib)

Language: Perl 6 - Size: 24.8 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

mjaglan/bigdata-example-project

Clustering with Spark on OpenStack Cloud. [OpenStack] [Hadoop] [Spark] [Ansible] [YAML] [Scala] [SBT] [Shell]

Language: Scala - Size: 30.2 MB - Last synced at: over 1 year ago - Pushed at: almost 9 years ago - Stars: 2 - Forks: 3

annagracia12/MassiveDataProcessing

Projects of the subject Massive Data Processing Engineering at Universidad Internacional de La Rioja.

Language: Jupyter Notebook - Size: 3.73 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

NikosVlachakis/Information-Systems-Analysis-and-Design

Semester project for the course "Information Systems Analysis and Design" at ECE-NTUA in 2022.

Language: TeX - Size: 730 KB - Last synced at: 12 months ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

vardan10/Technothon

Compares Execution time for MapReduce, Hive and spark. Plus do sentimental analysis on any Youtube video

Language: Python - Size: 14.1 MB - Last synced at: 11 months ago - Pushed at: over 7 years ago - Stars: 3 - Forks: 0

Heisenberghj7/Retail-Store-BigData

📊 📑This project provides a step-by-step big data analytics applied in the retail industry through the use of a variety of big data technologies. such as HDFS, Hive and Spark..

Language: Python - Size: 2.11 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

xavierguihot/mllib_decision_tree_reducer

Small facility which reduces naive decision tree models as produced by mllib

Language: Scala - Size: 291 KB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0

Lewuathe/dllib

dllib is a distributed deep learning library running on Apache Spark

Language: CSS - Size: 3.69 MB - Last synced at: 19 days ago - Pushed at: over 7 years ago - Stars: 32 - Forks: 5

30lm32/ml-random-forest-pyspark

Random Forest Binary Classification is applying on sample data in PySpark on Jupyter Notebook

Language: Jupyter Notebook - Size: 44.9 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 3 - Forks: 3

djgarcia/PCARD

PCARD Ensemble classifier for Big Data

Language: Scala - Size: 19.5 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 4 - Forks: 2

AdautoDCJunior/spark-processamento-linguagem-natural

Repositório do curso "Spark: processamento de linguagem natural" da Alura.

Language: Jupyter Notebook - Size: 536 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

AdautoDCJunior/spark-criando-modelos-classificacao

Repositório do curso "Spark: criando modelos de classificação" da Alura.

Language: Jupyter Notebook - Size: 160 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

lgallen/pyspark_dataframes

Tutorial on how to use the Python API for Spark dataframes.

Language: HTML - Size: 81.1 KB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 2

Denis-Mukhanov/Chicago_taxi_trips_BigData

Language: Jupyter Notebook - Size: 1 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

rdempsey/data-analytics-machine-learning-big-data

Slides, code and more for my class: Data Analytics and Machine Learning on Big Data

Language: Jupyter Notebook - Size: 127 MB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 8 - Forks: 21

emrekutlug/getting-started-with-pyspark

In this tutorial, I explained SparkContext by using map and filter methods with Lambda functions in Python and created RDD from object and external files, transformations and actions on RDD and pair RDD, PySpark DataFrame from RDD and external files, used sql queries with DataFrames by using Spark SQL, used machine learning with PySpark MLlib.

Language: Jupyter Notebook - Size: 8.58 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 5 - Forks: 6

sadikovi/spark-hosvd

Spark High Order SVD

Language: Scala - Size: 61.5 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 1

pausanchezv/Bases-de-dades-II-Big-Data

Assignatura Bases de Dades Avançades d'Enginyeria Informàtica (Universitatd de Barcelona)

Language: Java - Size: 106 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

AntonioLunardi/NLP_Spark_sentiment_analisys

A bag of words analisys based on IMDB movie opinions with PySpark

Language: Jupyter Notebook - Size: 338 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

vsmolyakov/pyspark

spark (scala and python)

Language: Python - Size: 2.4 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 17 - Forks: 6

cjr227/cf_spark_mllib

CF based Music Recommendation using Spark/MLLib

Language: Jupyter Notebook - Size: 7.81 KB - Last synced at: over 1 year ago - Pushed at: almost 8 years ago - Stars: 0 - Forks: 0

tankkyo/AuraAnalysis

Language: Scala - Size: 1.27 MB - Last synced at: over 1 year ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 1

javiaspiroz/ml-spark-anime

Sistema recomendador usando un dataset similar al de Movielens pero usando CSVs de animes y utilizando Apache Spark en la ejecución del algoritmo ALS.

Language: Python - Size: 7.31 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

kuptservol/jml

Yet Another Java ML Lib

Language: Java - Size: 2.44 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 0

Dheeraj2444/spark

Learning PySpark

Language: Jupyter Notebook - Size: 18.6 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 1

SalmaOuardi/Classification-with-PySpark

Discover classification through PySpark with MLlib + Feature Extraction

Language: Jupyter Notebook - Size: 7.81 KB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

imgoodman/kaggle-spark-ml

kaggle machine learning with spark

Language: Python - Size: 113 KB - Last synced at: about 1 year ago - Pushed at: almost 8 years ago - Stars: 7 - Forks: 2

Nanobelka/california-housing

PySpark pipeline for median house value prediction

Language: Jupyter Notebook - Size: 2.12 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

notPlancha/pbd

trabalho de pbd

Language: Jupyter Notebook - Size: 8.22 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

ironavt/spark-california-housing

Predicting median house value in California using big data tools

Language: Jupyter Notebook - Size: 15.6 KB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

krzysztofhewelt/Smart-applications-Python

Smart applications written in Python, using Pyspark and MLlib.

Language: Jupyter Notebook - Size: 1.5 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Anas436/Fundamentals-of-Scalable-Data-Science

Language: Jupyter Notebook - Size: 3.13 MB - Last synced at: 30 days ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

knoldus/Play-Spark-Scala

Language: Scala - Size: 236 KB - Last synced at: about 2 years ago - Pushed at: over 9 years ago - Stars: 52 - Forks: 40

knoldus/Movie-Recommendation-Engine

Recommends movies based on your interests

Language: Scala - Size: 6.84 KB - Last synced at: about 2 years ago - Pushed at: almost 9 years ago - Stars: 1 - Forks: 0

animenon/pyspark_mllib

Example from Spark MLLib (in python)

Language: Python - Size: 23.4 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 9 - Forks: 6

anandsharma-i/Parkinsons_Disease_Detection

This is a project about detection of Parkinson's disease based on the particular features/traits shown by patients.

Language: Jupyter Notebook - Size: 6.84 KB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

sramirez/spark-RELIEFFC-fselection

Distributed version of RELIEF-F algorithm for Apache Spark.

Language: Scala - Size: 15.2 MB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 3 - Forks: 2

Anas436/Getting-Started-with-PySpark

Language: Jupyter Notebook - Size: 1.8 MB - Last synced at: 30 days ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

pierrenodet/spark-smile 📦

Integrating SMILE and Spark

Language: Scala - Size: 341 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 0

data-integrations/logistic-regression-analytics 📦

Logistic Regression Predictor and Classifier Plugins

Language: Java - Size: 34.2 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 2

data-integrations/ngram-analytics 📦

NGram Analytics Transform Plugin: Transforms input features into n-grams

Language: Java - Size: 50.8 KB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 2

data-integrations/hashing-tf-feature-generator 📦

Hashing TF Feature Generator Plugin

Language: Java - Size: 31.3 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 2

anujdutt9/BigData-and-Machine-Learning

Basics of Big Data and Machine Learning using Apache Spark and Scala

Language: Scala - Size: 834 KB - Last synced at: 18 days ago - Pushed at: about 8 years ago - Stars: 7 - Forks: 9

alessandroiori/community-detection-lastfm

Python3, NetworkX, Java, MLlib, Spark, Cassandra, Neo4j 3.0, Gephi, Docker

Language: Roff - Size: 60.6 MB - Last synced at: about 2 years ago - Pushed at: almost 8 years ago - Stars: 11 - Forks: 2

data-integrations/tokenizer-analytics 📦

Tokenizer Analytics Plugin: A transform plugin to split data based on a pattern

Language: Java - Size: 37.1 KB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 2

data-integrations/skipgram-analytics 📦

SkipGram Feature Generator: Generates text-based features using Spark's Word2Vec.

Language: Java - Size: 37.1 KB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 1

data-integrations/decision-tree-analytics 📦

CDAP plugins for training a model using decision tree and for predicting outcomes based on the trained model.

Language: Java - Size: 67.4 KB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 1

tanersekmen/pyspark-mllib

mllib with pyspark

Language: Jupyter Notebook - Size: 24.4 KB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

colbyford/sparkitecture

A collection of “cookbook-style” scripts for simplifying data engineering and machine learning in Apache Spark.

Size: 665 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 11 - Forks: 4

shaharpit809/Latent-Dirichlet-allocation-LDA-on-YELP-dataset-using-Apache-Spark

This repository consists of comparison between two LDA algorithms (EM and Online) in Apache Spark 'mllib' library and also finding the best hyper parameters on YELP dataset.

Language: Java - Size: 6.43 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 1

matheusmmmp/MLlib-graphTracking

Tracking project with machine learning using pyspark mllib

Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

BigDataProcSystems/Practice

Language: Jupyter Notebook - Size: 10.1 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 2