An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: spark-mllib

LuckyZXL2016/Movie_Recommend

基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统

Language: Java - Size: 55.1 MB - Last synced at: 5 days ago - Pushed at: about 6 years ago - Stars: 2,917 - Forks: 1,051

databricks/LearningSparkV2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

Language: Scala - Size: 75.2 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 1,279 - Forks: 764

qubole/sparklens

Qubole Sparklens tool for performance tuning Apache Spark

Language: Scala - Size: 175 KB - Last synced at: 14 days ago - Pushed at: 10 months ago - Stars: 574 - Forks: 141

derrickburns/generalized-kmeans-clustering

Spark library for generalized K-Means clustering. Supports general Bregman divergences. Suitable for clustering probabilistic data, time series data, high dimensional data, and very large data.

Language: HTML - Size: 7.42 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 300 - Forks: 50

JaewonSon37/Mining_Big_Data2

Topic: Exploring the Relationship Between Weather and Taxi Demand in Chicago

Language: Jupyter Notebook - Size: 181 KB - Last synced at: 20 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

jaceklaskowski/spark-workshop

Apache Spark™ and Scala Workshops

Language: HTML - Size: 57 MB - Last synced at: 21 days ago - Pushed at: 9 months ago - Stars: 264 - Forks: 148

JoseRuiz01/AirlineOn-TimePerformanceAnalysis

Airline on-time performance analysis using Spark Machine Learning libraries

Size: 0 Bytes - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

OrvilleX/MachineLearning

本项目以应用为主出发,结合了从基础的机器学习、深度学习到目标检测以及目前最新的大模型,采用目前成熟的 第三方库、开源预训练模型以及相关论文的最新技术,目的是记录学习的过程同时也进行分享以供更多人可以直接进行使用。

Language: Jupyter Notebook - Size: 11.4 MB - Last synced at: 17 days ago - Pushed at: about 2 months ago - Stars: 66 - Forks: 22

Mohitsai/epidemic-engine

Streaming ETL data pipeline for health event monitoring and predictive analytics using Kafka, Airflow, Docker, Hadoop and Spark ML.

Language: Python - Size: 6.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

wzhe06/SparkCTR

CTR prediction model based on spark(LR, GBDT, DNN)

Language: Scala - Size: 35 MB - Last synced at: 14 days ago - Pushed at: about 5 years ago - Stars: 912 - Forks: 260

omerbsezer/SparkMlpDow30 📦

A new stock trading and prediction model based on a MLP neural network utilizing technical analysis indicator values as features (using Apache Spark MLlib)

Language: Java - Size: 201 MB - Last synced at: about 1 month ago - Pushed at: about 7 years ago - Stars: 36 - Forks: 17

wikistat/AI-Frameworks

Science des Données Saison 5: Technologies pour l'apprentissage automatique / statistique de données massives et l'Intelligence Artificielle

Language: Jupyter Notebook - Size: 646 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 44 - Forks: 42

TsungTseTu122/CloudComputing--MovieLens-Big-Data-Analytics-on-the-Cloud

This project analyzes the MovieLens dataset using PySpark, Hadoop HDFS, and Docker to perform clustering, classification, and association rule mining on user-movie interactions. The system runs in a containerized cloud environment with Spark clusters, enabling scalable big data processing.

Size: 10.7 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

LuisFalva/ophelia

Ophelian On Mars! More than a simple framework.

Language: Python - Size: 2.16 MB - Last synced at: 7 days ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 5

berksudan/PySpark-Auto-Clustering

Implemented an auto-clustering tool with seed and number of clusters finder. Optimizing algorithms: Silhouette, Elbow. Clustering algorithms: k-Means, Bisecting k-Means, Gaussian Mixture. Module includes micro-macro pivoting, and dashboards displaying radius, centroids, and inertia of clusters. Used: Python, Pyspark, Matplotlib, Spark MLlib.

Language: Python - Size: 64.5 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

madhurimarawat/Big-Data-Analytics

This repository demonstrates big data processing, visualization, and machine learning using tools such as Hadoop, Spark, Kafka, and Python.

Language: Jupyter Notebook - Size: 10.7 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 1

hazecodeio/spark-sandbox

Language: Scala - Size: 13.1 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

abouslimi/spark-ml-product-recommendation

Real-time product recommendation system built using Apache Spark, Kafka, and Python.

Language: Python - Size: 419 KB - Last synced at: 18 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

SayamAlt/Amazon-Products-API-ETL-and-ML-pipeline

In this project, I've created an end-to-end ETL pipeline and subsequently developed a machine learning model to predict the price of Amazon products based on several product-related features.

Language: Python - Size: 2.95 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

madhans476/Spark_NLP_Spark_ML_lib

This project combines the power of Spark NLP for natural language processing and Spark MLlib for machine learning, allowing users to efficiently classify text into multiple categories in a distributed computing environment.

Language: Jupyter Notebook - Size: 3.62 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

DebanjanSarkar/pyspark-maestro

This repo contains implementations of PySpark for real-world use cases for batch data processing, streaming data processing sourced from Kafka, sockets, etc., spark optimizations, business specific bigdata processing scenario solutions, and machine learning use cases.

Language: Jupyter Notebook - Size: 66.1 MB - Last synced at: 18 days ago - Pushed at: 9 months ago - Stars: 2 - Forks: 1

zikzakjack/spark-demos

Apache Spark Demos

Language: Jupyter Notebook - Size: 103 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

IBM/db2-event-store-iot-analytics 📦

IoT sensor temperature analysis and prediction with IBM Db2 Event Store

Language: Jupyter Notebook - Size: 36.6 MB - Last synced at: 4 months ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 22

AttitudeAdjuster/Accident-Severity-Prediction

IBM Coursera Capstone Project - Predict Accident Severity Given Weather, Road and Lighting Conditions

Language: Jupyter Notebook - Size: 161 MB - Last synced at: 6 months ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 9

pathak-ashutosh/sentiment-analysis-yelp-reviews

Perform sentiment analysis on Yelp dataset with Apache Spark

Language: Python - Size: 133 KB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

SayamAlt/PySpark-for-Big-Data-and-Machine-Learning

This is the material for Jose Portilla's Spark and Python for Big Data and ML course.

Language: Jupyter Notebook - Size: 3.79 MB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

cbozan/graduation-project

Graduation project categorizes popular search phrases using Python and Spark and presents them on a website to inspire creators.

Language: Python - Size: 402 KB - Last synced at: 18 days ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 0

Wadaboa/production-line-performance

Scala/Spark project, for Languages and Algorithms for Artificial Intelligence class at UNIBO

Language: Scala - Size: 31 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

arturogonzalezm/energy_price_and_demand_forecast

AEMO Aggregated price and demand data

Language: Python - Size: 14.1 MB - Last synced at: 2 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

polaternez/Introduction-to-Big-Data

Big Data projects for beginners

Language: Java - Size: 4.63 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

michael-pupulin/Scala_Spark_and_SQL

I do some basic statistics and machine learning work on a dataset of tornado events across the United States. The dataset is nowhere near big enough to warrant using Spark over something like R, but I was looking for practice. I do some basic SQL to find out which years and states saw the most tornadoes and the most F5 tornadoes. Then I use Spark's MLlib to do linear regression of time and tornado counts.

Language: Scala - Size: 30.3 KB - Last synced at: 11 months ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

mnassrib/pyspark-examples

This tutorial presents some examples in order to give a quick overview of the Spark APIs.

Language: Jupyter Notebook - Size: 8.48 MB - Last synced at: 11 months ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 0

NikitaVispute/Big-Data-Projects

Language: Scala - Size: 5.2 MB - Last synced at: 11 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

Coursal/Text-Sentiment-Analysis-In-Hadoop-And-Spark

The source code developed and used for the purposes of my thesis with the same title under the guidance of my supervisor professor Vasilis Mamalis for the Department of Informatics and Computer Engineering of the University of West Attica.

Language: Java - Size: 66.5 MB - Last synced at: 11 months ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 1

alivcor/SMORK

Implementation of SMOTE - Synthetic Minority Over-sampling Technique in SparkML / MLLib

Language: Scala - Size: 165 KB - Last synced at: 11 months ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 2

abulbasar/zeppelin-notebooks

Size: 3.91 KB - Last synced at: 12 months ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 1

aabdel-kader/Apache-Spark

A repository for my practices and projects using pyspark

Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

wengbenjue/spark_recomend

使用Spark的MLlib、Hbase作为模型、Hive作数据清洗的核心推荐引擎,在Spark on Yarn测试通过

Language: C - Size: 41.2 MB - Last synced at: about 1 year ago - Pushed at: about 8 years ago - Stars: 29 - Forks: 17

yennanliu/NYC_Taxi_Trip_Duration

Develop ML models predict taxi trip duration in NYC. Ranked : Top 6% | RMSLE : 0.377 (Kaggle) | #DS

Language: Jupyter Notebook - Size: 43.2 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 15 - Forks: 8

P7h/p7hb-docker-mllib-twitter-sentiment

:ship: Docker image for Twitter Sentiment analysis with Spark MLlib

Language: Shell - Size: 138 KB - Last synced at: about 1 year ago - Pushed at: almost 8 years ago - Stars: 7 - Forks: 3

vaslnk/Spotify-Song-Recommendation-ML

UC Berkeley team's submission for RecSys Challenge 2018

Language: Jupyter Notebook - Size: 34.2 MB - Last synced at: 11 months ago - Pushed at: almost 7 years ago - Stars: 86 - Forks: 22

P7h/Spark-MLlib-Twitter-Sentiment-Analysis

:star2: :sparkles: Analyze and visualize Twitter Sentiment on a world map using Spark MLlib

Language: Scala - Size: 19.7 MB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 135 - Forks: 69

aliabbasi2000/Spark

Solving Big Data Problems using Spark framework in Java. Running the Project on HDFS clusters (BigData@Polito) to get the results.

Language: Java - Size: 143 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

felidsche/movie-recommender

A movie recommendation system built using Apache Spark’s ML library

Language: Python - Size: 829 KB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

josemarialuna/ExternalValidity

This package contains the code for calculating external clustering validity indices in Spark. The package includes Chi Index among others.

Language: Scala - Size: 146 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 9 - Forks: 1

IBM/icp4d-customer-churn-classifier

Infuse AI into your application. Create and deploy a customer churn prediction model with IBM Cloud Private for Data, Db2 Warehouse, Spark MLlib, and Jupyter notebooks.

Language: Jupyter Notebook - Size: 28.1 MB - Last synced at: 4 months ago - Pushed at: almost 2 years ago - Stars: 17 - Forks: 22

bobxwang/predict-stock-in-spark

using spark to predict stock, the data come from sina

Language: Scala - Size: 143 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

CaioBrainer/Hadoop_Projects

Pequenos projetos utilizando ferramentas do ecossistema Apache Hadoop

Language: Python - Size: 18.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Jayant1234/Malware-classification Fork of dsp-uga/sabayon-p1

Language: Jupyter Notebook - Size: 11.4 MB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

marcocolangelo/Big-Data-processing-and-Analytics

The current repository contains all the code developed during the Big Data processing and Analytics laboratories. Data are processed and analyzed using Hadoop and Spark

Language: Java - Size: 6.1 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

FlorentF9/sparkml-som

:sparkles: Spark ML implementation of SOM algorithm (Kohonen self-organizing map)

Language: Scala - Size: 29.3 KB - Last synced at: 27 days ago - Pushed at: about 3 years ago - Stars: 17 - Forks: 6

mikerly131/serveUpRecos

Build a mock EMR app and integrate an AI/ML prediction into an encounter workflow

Language: CSS - Size: 54.9 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

polaroidz/sales_prediction

A Production Machine Learning Pipeline for Predicting Future Sales with Spark

Language: Jupyter Notebook - Size: 90.8 KB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 0

venkateshavula/Evaluate-Spark-MLlib-using-PySpark

A UDF to evaluate Spark-MLlib classification model using PySpark

Language: Python - Size: 4.88 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

miguelangel43/Prediction-Flight-Arrivals-Delays-Spark

Application that trains a classifier and predicts flight arrival delays based on past information. Uses the libraries pyspark.ml and pyspark.sql, performs feature engineering, cross-validation and tests various ML algorithms.

Language: Python - Size: 41 KB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

amitkumarusc/recommendation-system

A movie recommendation system trained on the MovieLens 20 Million dataset. This system makes use of Collaborative filtering methods to come up with recommendations for a particular user.

Language: Jupyter Notebook - Size: 21.8 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 13 - Forks: 3

harishpuvvada/BitCoin-Value-Predictor 📦

[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin

Language: Jupyter Notebook - Size: 3.07 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 113 - Forks: 29

ShubhamJagtap2000/Spark-Python

🐍💥Python and Spark for Big Data

Language: Jupyter Notebook - Size: 73.2 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

omerbsezer/SparkDeepMlpGADow30 📦

A Deep Neural-Network based (Deep MLP) Stock Trading System based on Evolutionary (Genetic Algorithm) Optimized Technical Analysis Parameters (using Apache Spark MLlib)

Language: Java - Size: 213 MB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 58 - Forks: 46

alessandrolulli/reforest

Random Forests in Apache Spark

Language: Scala - Size: 71 MB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 72 - Forks: 11

MNoorFawi/linear-regression-with-spark

Creating an SBT-based Spark Application to predict Online News Popularity using Linear Regression Algorithm ...

Language: Scala - Size: 18.6 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

MNoorFawi/kmeans-clustering-with-spark

Creating an sbt Apache Spark application to perform customer segmentation using Spark MLlib KMeans ...

Language: Scala - Size: 15.6 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 1

jingpeicomp/product-category-predict

商品类目预测,使用 Spring Boot 开发框架和 Spark MLlib 机器学习框架,通过 TF-IDF 和 Bayes 算法,训练出一个商品类目预测模型。该模型可以根据商品名称自动预测出商品类目。项目对外提供 RESTFul 接口。

Language: Java - Size: 41.2 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 119 - Forks: 60

lookuut/raif-competition

Spark application for prediction home and work coordinates of the customer by payment transactions

Language: Scala - Size: 27.7 MB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 0

akarsh3007/Recommendation-Systems

Simple Content based and Collaborative Filtering Algorithms implementaion

Language: Python - Size: 1.36 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

jr2ngb2/yelp_recommender

Language: Jupyter Notebook - Size: 45.9 KB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 1 - Forks: 0

xtutran/spark-tutor

Using spark-sql & spark-mllib to tackle Titanic & Movie Recomendation

Language: Scala - Size: 112 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

trendyol-data-eng-summer-intern-2019/recom-engine-streaming

Streaming component of the project, which is written with Spark Streaming.

Language: Scala - Size: 15.6 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 1

agrimrules/brewery

A spark job that processes data scraped from the web

Language: Scala - Size: 17.6 KB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 0

NashTech-Labs/Sparkathon

A library having Java and Scala examples for Spark 2.x

Language: Java - Size: 113 MB - Last synced at: 21 days ago - Pushed at: over 8 years ago - Stars: 7 - Forks: 9

gavalle94/Songs-Recommender

Recommendation System written in Python, using the pySpark framework and other Data Science libraries

Language: HTML - Size: 5.23 MB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 1

hoangviet148/Foody

Language: Python - Size: 17.6 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 1

avcaliani/spark-ml-app

🤖

Language: Jupyter Notebook - Size: 118 KB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

Lucass97/FlightAnalysis

This project implemented a lambda architecture for analyzing domestic flight data in the US from 2009 to 2020. It used Apache Spark for batch processing, Spark Streaming for real-time analysis, and SVM models to predict flight cancellations and delays, with Docker for cluster management and Grafana for real-time visualization.

Language: Jupyter Notebook - Size: 5.66 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

shalakasaraogi/apache-spark-pig-hive-work

This repository contains Apache Spark, Apache Hive, Apache Pig work

Language: PigLatin - Size: 813 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

jacopocav/spark-ifs

Iterative filter-based feature selection on large datasets with Apache Spark

Language: Scala - Size: 130 KB - Last synced at: almost 2 years ago - Pushed at: almost 7 years ago - Stars: 3 - Forks: 0

xghan99/bigdata-assignments

This repository consists of code I wrote for CS4225 - Big Data Systems for Data Science

Language: Jupyter Notebook - Size: 3.64 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

shubham-deb/Neural-Circuit-Tracer

This repository contains the source codes & scripts of my project for Master's level course - CS6240 Parallel Data Processing in Map-Reduce course at College of Computer & Information Science, Northeastern University, Boston MA.

Language: Scala - Size: 663 KB - Last synced at: about 1 year ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 0

shubham-deb/Spark_Scala_Programs

This repository contains all the Spark Scala programs that I have implemented during my Master's level course - CS6240 Parallel Data Processing in Map-Reduce course at College of Computer & Information Science, Northeastern University, Boston MA.

Language: Makefile - Size: 4 MB - Last synced at: about 1 year ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 0

esap120/spark-twitter-streaming

Streaming Twitter Sentiment Analysis with Apache Spark

Language: Scala - Size: 8.79 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 1

Siddharth1989/ProspectiveTopUpCustomerPrediction

Developed a model/Spark ML pipeline stream to identify potential customers that may purchase top up services in the future.

Language: Jupyter Notebook - Size: 6.17 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

huzaifakhan04/amazon-product-recommendation-system-web-application-flask-using-mongodb-pyspark-and-apache-kafka

This repository includes a web application connected to a product recommendation system developed with the comprehensive Amazon Review Data (2018) dataset, consisting of nearly 233.1 million records and occupying approximately 128 gigabytes (GB) of data storage, using MongoDB, PySpark, and Apache Kafka.

Language: Jupyter Notebook - Size: 91 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

ging/fiware-ml-supermarket

Demo: Predicting purchase volume in a supermarket using FIWARE

Language: JavaScript - Size: 290 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 1

surajsrivathsa/Supervised_Link_Prediction_Using_Spark_and_Neo4j

A project which involves analysis of Authorship graph data from Microsoft academic graph. In this project we calculate different graph features using temporal parameters of the authors and tried different classifiers. The final aim is to predict the link or coauthorsip possibility between two authors based on topological graph features and also find out the feasibility of performing this task on Neo4j and Spark

Language: Scala - Size: 15.8 MB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 5 - Forks: 4

MHassaanButt/Flight-Delays-Prediction

In this project, I used Decision Tree Learning Model as the main algorithm to build the model. Due to the big amount of flight data, we implement the project using MRJob, PySpark and Spark's MLlib then compare the performance and accuracy of those implementations.

Language: Jupyter Notebook - Size: 17.5 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 8 - Forks: 0

avaibh/Twitter-Bot-Detection

Big Data Stack: Spark, Kafka, Elasticsearch and NoSQL

Language: Jupyter Notebook - Size: 304 KB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 4 - Forks: 0

anmolmore/Enzyme-Classifier-Using-ML

Classify enzymes with geomic sequence using spark-ML

Language: Jupyter Notebook - Size: 719 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

svngoku/Pyspark-pour-les-datas-engineers

Introduction à Pyspark pour les Data Engineers par la pratique

Language: Jupyter Notebook - Size: 784 KB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

bassrehab/zerofish-imaging

Using the Thunder Library for Image Processing with Spark ML Lib

Language: Python - Size: 1.83 MB - Last synced at: almost 2 years ago - Pushed at: about 8 years ago - Stars: 1 - Forks: 0

grishenkovp/apache_spark

Изучение Apache Spark. Библиотека PySpark

Language: Jupyter Notebook - Size: 135 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

satyajeetmaharana/floodprediction

The goal of this project is to identify the flood-prone areas with probabilities of flood in counties in a future date, using Spark MLLib.

Language: Scala - Size: 3.46 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 1

forons/BigDataExamples

Code repository for the MSc course "Big Data and Social Networks" of the University of Trento

Language: Jupyter Notebook - Size: 229 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 13

amanjeetsahu/Apache-Spark-Tutorials

This repo contains my learnings and practice notebooks on Spark using PySpark (Python Language API on Spark). All the notebooks in the repo can be used as template code for most of the ML algorithms and can be built upon it for more complex problems.

Language: Jupyter Notebook - Size: 20.9 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 10

Rohini2505/Lending-Club-Loan-Analysis

Explanatory Data Analysis and ML model building using Apache Spark and PySpark

Language: HTML - Size: 6.26 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 10 - Forks: 12

ABigdataer/MovieRecommendSystem

基于Spark的电影推荐系统

Language: HTML - Size: 64.6 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 4

alefbt/SparkML-spring-scoring-poc 📦

POC of socring rest service od Spark ML Pipelines

Language: Java - Size: 234 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

corvaglia-alessio/big-data-labs 📦

Labs for the course "Big Data: architectures and data analytics" @ Politecnico di Torino a.y. 2021/22

Language: Java - Size: 53.4 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

chaithrakc/credit_card_default_prediction

Analyzing the likelihood of credit card delinquency without using credit scores or credit history

Language: Jupyter Notebook - Size: 4.16 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

huangyueranbbc/Spark_ALS

基于spark-ml,spark-mllib,spark-streaming的推荐算法实现

Language: Java - Size: 97.7 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 89 - Forks: 46

kocharshaivi19/Stock-Analysis-and-Prediction

Financial Forecasting and its correlation with Human Sentiments using Distributed Computing on Spark Framework

Language: Scala - Size: 811 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 4 - Forks: 4

Related Keywords
spark-mllib 173 spark 111 spark-sql 55 spark-streaming 50 machine-learning 42 scala 40 pyspark 33 spark-ml 32 apache-spark 27 python 23 big-data 20 python3 16 hadoop 11 kafka 8 data-science 7 java 7 mongodb 7 hadoop-mapreduce 7 recommender-system 7 sparksql 7 sparkjava 7 big-data-analytics 6 bigdata 6 docker 6 sentiment-analysis 6 collaborative-filtering 6 jupyter-notebook 6 ml 5 nlp 5 linear-regression 5 spark-structured-streaming 5 clustering 5 data-analysis 5 kmeans-clustering 5 hdfs 4 feature-engineering 4 random-forest 4 natural-language-processing 4 data-visualization 4 hadoop-hdfs 4 databricks 3 structured-streaming 3 elasticsearch 3 apache-kafka 3 logistic-regression 3 kafka-streams 3 spark-nlp 3 docker-compose 3 delta-lake 3 rdd 3 twitter 3 pyspark-python 3 recommendation-system 3 aws-s3 3 mapreduce 3 apache-hadoop 3 alternating-least-squares 3 spring-boot 3 twitter-sentiment-analysis 3 visualization 3 supervised-learning 3 prediction 3 decision-trees 3 distributed-computing 2 neo4j 2 lda 2 pandas 2 product-recommender-system 2 product-recommendation 2 kafka-producer 2 kafka-consumer 2 cassandra 2 hadoop-framework 2 sbt 2 als 2 spark-rdd 2 spark-mllib-library 2 r 2 stock-price-prediction 2 graphx 2 topic-modeling 2 classification-algorithm 2 spark-streaming-kafka 2 spark-core 2 java-8 2 twitter-api 2 aws 2 naive-bayes 2 naive-bayes-classification 2 recommendation 2 decision-tree-classifier 2 classification 2 text-mining 2 matplotlib 2 pyspark-machine-learning 2 pyspark-api 2 spark-dataframes 2 kaggle 2 clustering-evaluation 2 price-prediction 2