GitHub topics: spark-mllib
LuckyZXL2016/Movie_Recommend
基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统
Language: Java - Size: 55.1 MB - Last synced at: 5 days ago - Pushed at: about 6 years ago - Stars: 2,917 - Forks: 1,051

databricks/LearningSparkV2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Language: Scala - Size: 75.2 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 1,279 - Forks: 764

qubole/sparklens
Qubole Sparklens tool for performance tuning Apache Spark
Language: Scala - Size: 175 KB - Last synced at: 14 days ago - Pushed at: 10 months ago - Stars: 574 - Forks: 141

derrickburns/generalized-kmeans-clustering
Spark library for generalized K-Means clustering. Supports general Bregman divergences. Suitable for clustering probabilistic data, time series data, high dimensional data, and very large data.
Language: HTML - Size: 7.42 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 300 - Forks: 50

JaewonSon37/Mining_Big_Data2
Topic: Exploring the Relationship Between Weather and Taxi Demand in Chicago
Language: Jupyter Notebook - Size: 181 KB - Last synced at: 20 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

jaceklaskowski/spark-workshop
Apache Spark™ and Scala Workshops
Language: HTML - Size: 57 MB - Last synced at: 21 days ago - Pushed at: 9 months ago - Stars: 264 - Forks: 148

JoseRuiz01/AirlineOn-TimePerformanceAnalysis
Airline on-time performance analysis using Spark Machine Learning libraries
Size: 0 Bytes - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

OrvilleX/MachineLearning
本项目以应用为主出发,结合了从基础的机器学习、深度学习到目标检测以及目前最新的大模型,采用目前成熟的 第三方库、开源预训练模型以及相关论文的最新技术,目的是记录学习的过程同时也进行分享以供更多人可以直接进行使用。
Language: Jupyter Notebook - Size: 11.4 MB - Last synced at: 17 days ago - Pushed at: about 2 months ago - Stars: 66 - Forks: 22

Mohitsai/epidemic-engine
Streaming ETL data pipeline for health event monitoring and predictive analytics using Kafka, Airflow, Docker, Hadoop and Spark ML.
Language: Python - Size: 6.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

wzhe06/SparkCTR
CTR prediction model based on spark(LR, GBDT, DNN)
Language: Scala - Size: 35 MB - Last synced at: 14 days ago - Pushed at: about 5 years ago - Stars: 912 - Forks: 260

omerbsezer/SparkMlpDow30 📦
A new stock trading and prediction model based on a MLP neural network utilizing technical analysis indicator values as features (using Apache Spark MLlib)
Language: Java - Size: 201 MB - Last synced at: about 1 month ago - Pushed at: about 7 years ago - Stars: 36 - Forks: 17

wikistat/AI-Frameworks
Science des Données Saison 5: Technologies pour l'apprentissage automatique / statistique de données massives et l'Intelligence Artificielle
Language: Jupyter Notebook - Size: 646 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 44 - Forks: 42

TsungTseTu122/CloudComputing--MovieLens-Big-Data-Analytics-on-the-Cloud
This project analyzes the MovieLens dataset using PySpark, Hadoop HDFS, and Docker to perform clustering, classification, and association rule mining on user-movie interactions. The system runs in a containerized cloud environment with Spark clusters, enabling scalable big data processing.
Size: 10.7 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

LuisFalva/ophelia
Ophelian On Mars! More than a simple framework.
Language: Python - Size: 2.16 MB - Last synced at: 7 days ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 5

berksudan/PySpark-Auto-Clustering
Implemented an auto-clustering tool with seed and number of clusters finder. Optimizing algorithms: Silhouette, Elbow. Clustering algorithms: k-Means, Bisecting k-Means, Gaussian Mixture. Module includes micro-macro pivoting, and dashboards displaying radius, centroids, and inertia of clusters. Used: Python, Pyspark, Matplotlib, Spark MLlib.
Language: Python - Size: 64.5 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

madhurimarawat/Big-Data-Analytics
This repository demonstrates big data processing, visualization, and machine learning using tools such as Hadoop, Spark, Kafka, and Python.
Language: Jupyter Notebook - Size: 10.7 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 1

hazecodeio/spark-sandbox
Language: Scala - Size: 13.1 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

abouslimi/spark-ml-product-recommendation
Real-time product recommendation system built using Apache Spark, Kafka, and Python.
Language: Python - Size: 419 KB - Last synced at: 18 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

SayamAlt/Amazon-Products-API-ETL-and-ML-pipeline
In this project, I've created an end-to-end ETL pipeline and subsequently developed a machine learning model to predict the price of Amazon products based on several product-related features.
Language: Python - Size: 2.95 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

madhans476/Spark_NLP_Spark_ML_lib
This project combines the power of Spark NLP for natural language processing and Spark MLlib for machine learning, allowing users to efficiently classify text into multiple categories in a distributed computing environment.
Language: Jupyter Notebook - Size: 3.62 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

DebanjanSarkar/pyspark-maestro
This repo contains implementations of PySpark for real-world use cases for batch data processing, streaming data processing sourced from Kafka, sockets, etc., spark optimizations, business specific bigdata processing scenario solutions, and machine learning use cases.
Language: Jupyter Notebook - Size: 66.1 MB - Last synced at: 18 days ago - Pushed at: 9 months ago - Stars: 2 - Forks: 1

zikzakjack/spark-demos
Apache Spark Demos
Language: Jupyter Notebook - Size: 103 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

IBM/db2-event-store-iot-analytics 📦
IoT sensor temperature analysis and prediction with IBM Db2 Event Store
Language: Jupyter Notebook - Size: 36.6 MB - Last synced at: 4 months ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 22

AttitudeAdjuster/Accident-Severity-Prediction
IBM Coursera Capstone Project - Predict Accident Severity Given Weather, Road and Lighting Conditions
Language: Jupyter Notebook - Size: 161 MB - Last synced at: 6 months ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 9

pathak-ashutosh/sentiment-analysis-yelp-reviews
Perform sentiment analysis on Yelp dataset with Apache Spark
Language: Python - Size: 133 KB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

SayamAlt/PySpark-for-Big-Data-and-Machine-Learning
This is the material for Jose Portilla's Spark and Python for Big Data and ML course.
Language: Jupyter Notebook - Size: 3.79 MB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

cbozan/graduation-project
Graduation project categorizes popular search phrases using Python and Spark and presents them on a website to inspire creators.
Language: Python - Size: 402 KB - Last synced at: 18 days ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 0

Wadaboa/production-line-performance
Scala/Spark project, for Languages and Algorithms for Artificial Intelligence class at UNIBO
Language: Scala - Size: 31 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

arturogonzalezm/energy_price_and_demand_forecast
AEMO Aggregated price and demand data
Language: Python - Size: 14.1 MB - Last synced at: 2 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

polaternez/Introduction-to-Big-Data
Big Data projects for beginners
Language: Java - Size: 4.63 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

michael-pupulin/Scala_Spark_and_SQL
I do some basic statistics and machine learning work on a dataset of tornado events across the United States. The dataset is nowhere near big enough to warrant using Spark over something like R, but I was looking for practice. I do some basic SQL to find out which years and states saw the most tornadoes and the most F5 tornadoes. Then I use Spark's MLlib to do linear regression of time and tornado counts.
Language: Scala - Size: 30.3 KB - Last synced at: 11 months ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

mnassrib/pyspark-examples
This tutorial presents some examples in order to give a quick overview of the Spark APIs.
Language: Jupyter Notebook - Size: 8.48 MB - Last synced at: 11 months ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 0

NikitaVispute/Big-Data-Projects
Language: Scala - Size: 5.2 MB - Last synced at: 11 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

Coursal/Text-Sentiment-Analysis-In-Hadoop-And-Spark
The source code developed and used for the purposes of my thesis with the same title under the guidance of my supervisor professor Vasilis Mamalis for the Department of Informatics and Computer Engineering of the University of West Attica.
Language: Java - Size: 66.5 MB - Last synced at: 11 months ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 1

alivcor/SMORK
Implementation of SMOTE - Synthetic Minority Over-sampling Technique in SparkML / MLLib
Language: Scala - Size: 165 KB - Last synced at: 11 months ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 2

abulbasar/zeppelin-notebooks
Size: 3.91 KB - Last synced at: 12 months ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 1

aabdel-kader/Apache-Spark
A repository for my practices and projects using pyspark
Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

wengbenjue/spark_recomend
使用Spark的MLlib、Hbase作为模型、Hive作数据清洗的核心推荐引擎,在Spark on Yarn测试通过
Language: C - Size: 41.2 MB - Last synced at: about 1 year ago - Pushed at: about 8 years ago - Stars: 29 - Forks: 17

yennanliu/NYC_Taxi_Trip_Duration
Develop ML models predict taxi trip duration in NYC. Ranked : Top 6% | RMSLE : 0.377 (Kaggle) | #DS
Language: Jupyter Notebook - Size: 43.2 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 15 - Forks: 8

P7h/p7hb-docker-mllib-twitter-sentiment
:ship: Docker image for Twitter Sentiment analysis with Spark MLlib
Language: Shell - Size: 138 KB - Last synced at: about 1 year ago - Pushed at: almost 8 years ago - Stars: 7 - Forks: 3

vaslnk/Spotify-Song-Recommendation-ML
UC Berkeley team's submission for RecSys Challenge 2018
Language: Jupyter Notebook - Size: 34.2 MB - Last synced at: 11 months ago - Pushed at: almost 7 years ago - Stars: 86 - Forks: 22

P7h/Spark-MLlib-Twitter-Sentiment-Analysis
:star2: :sparkles: Analyze and visualize Twitter Sentiment on a world map using Spark MLlib
Language: Scala - Size: 19.7 MB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 135 - Forks: 69

aliabbasi2000/Spark
Solving Big Data Problems using Spark framework in Java. Running the Project on HDFS clusters (BigData@Polito) to get the results.
Language: Java - Size: 143 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

felidsche/movie-recommender
A movie recommendation system built using Apache Spark’s ML library
Language: Python - Size: 829 KB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

josemarialuna/ExternalValidity
This package contains the code for calculating external clustering validity indices in Spark. The package includes Chi Index among others.
Language: Scala - Size: 146 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 9 - Forks: 1

IBM/icp4d-customer-churn-classifier
Infuse AI into your application. Create and deploy a customer churn prediction model with IBM Cloud Private for Data, Db2 Warehouse, Spark MLlib, and Jupyter notebooks.
Language: Jupyter Notebook - Size: 28.1 MB - Last synced at: 4 months ago - Pushed at: almost 2 years ago - Stars: 17 - Forks: 22

bobxwang/predict-stock-in-spark
using spark to predict stock, the data come from sina
Language: Scala - Size: 143 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

CaioBrainer/Hadoop_Projects
Pequenos projetos utilizando ferramentas do ecossistema Apache Hadoop
Language: Python - Size: 18.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Jayant1234/Malware-classification Fork of dsp-uga/sabayon-p1
Language: Jupyter Notebook - Size: 11.4 MB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

marcocolangelo/Big-Data-processing-and-Analytics
The current repository contains all the code developed during the Big Data processing and Analytics laboratories. Data are processed and analyzed using Hadoop and Spark
Language: Java - Size: 6.1 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

FlorentF9/sparkml-som
:sparkles: Spark ML implementation of SOM algorithm (Kohonen self-organizing map)
Language: Scala - Size: 29.3 KB - Last synced at: 27 days ago - Pushed at: about 3 years ago - Stars: 17 - Forks: 6

mikerly131/serveUpRecos
Build a mock EMR app and integrate an AI/ML prediction into an encounter workflow
Language: CSS - Size: 54.9 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

polaroidz/sales_prediction
A Production Machine Learning Pipeline for Predicting Future Sales with Spark
Language: Jupyter Notebook - Size: 90.8 KB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 0

venkateshavula/Evaluate-Spark-MLlib-using-PySpark
A UDF to evaluate Spark-MLlib classification model using PySpark
Language: Python - Size: 4.88 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

miguelangel43/Prediction-Flight-Arrivals-Delays-Spark
Application that trains a classifier and predicts flight arrival delays based on past information. Uses the libraries pyspark.ml and pyspark.sql, performs feature engineering, cross-validation and tests various ML algorithms.
Language: Python - Size: 41 KB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

amitkumarusc/recommendation-system
A movie recommendation system trained on the MovieLens 20 Million dataset. This system makes use of Collaborative filtering methods to come up with recommendations for a particular user.
Language: Jupyter Notebook - Size: 21.8 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 13 - Forks: 3

harishpuvvada/BitCoin-Value-Predictor 📦
[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Language: Jupyter Notebook - Size: 3.07 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 113 - Forks: 29

ShubhamJagtap2000/Spark-Python
🐍💥Python and Spark for Big Data
Language: Jupyter Notebook - Size: 73.2 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

omerbsezer/SparkDeepMlpGADow30 📦
A Deep Neural-Network based (Deep MLP) Stock Trading System based on Evolutionary (Genetic Algorithm) Optimized Technical Analysis Parameters (using Apache Spark MLlib)
Language: Java - Size: 213 MB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 58 - Forks: 46

alessandrolulli/reforest
Random Forests in Apache Spark
Language: Scala - Size: 71 MB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 72 - Forks: 11

MNoorFawi/linear-regression-with-spark
Creating an SBT-based Spark Application to predict Online News Popularity using Linear Regression Algorithm ...
Language: Scala - Size: 18.6 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

MNoorFawi/kmeans-clustering-with-spark
Creating an sbt Apache Spark application to perform customer segmentation using Spark MLlib KMeans ...
Language: Scala - Size: 15.6 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 1

jingpeicomp/product-category-predict
商品类目预测,使用 Spring Boot 开发框架和 Spark MLlib 机器学习框架,通过 TF-IDF 和 Bayes 算法,训练出一个商品类目预测模型。该模型可以根据商品名称自动预测出商品类目。项目对外提供 RESTFul 接口。
Language: Java - Size: 41.2 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 119 - Forks: 60

lookuut/raif-competition
Spark application for prediction home and work coordinates of the customer by payment transactions
Language: Scala - Size: 27.7 MB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 0

akarsh3007/Recommendation-Systems
Simple Content based and Collaborative Filtering Algorithms implementaion
Language: Python - Size: 1.36 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

jr2ngb2/yelp_recommender
Language: Jupyter Notebook - Size: 45.9 KB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 1 - Forks: 0

xtutran/spark-tutor
Using spark-sql & spark-mllib to tackle Titanic & Movie Recomendation
Language: Scala - Size: 112 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

trendyol-data-eng-summer-intern-2019/recom-engine-streaming
Streaming component of the project, which is written with Spark Streaming.
Language: Scala - Size: 15.6 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 1

agrimrules/brewery
A spark job that processes data scraped from the web
Language: Scala - Size: 17.6 KB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 0

NashTech-Labs/Sparkathon
A library having Java and Scala examples for Spark 2.x
Language: Java - Size: 113 MB - Last synced at: 21 days ago - Pushed at: over 8 years ago - Stars: 7 - Forks: 9

gavalle94/Songs-Recommender
Recommendation System written in Python, using the pySpark framework and other Data Science libraries
Language: HTML - Size: 5.23 MB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 1

hoangviet148/Foody
Language: Python - Size: 17.6 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 1

avcaliani/spark-ml-app
🤖
Language: Jupyter Notebook - Size: 118 KB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

Lucass97/FlightAnalysis
This project implemented a lambda architecture for analyzing domestic flight data in the US from 2009 to 2020. It used Apache Spark for batch processing, Spark Streaming for real-time analysis, and SVM models to predict flight cancellations and delays, with Docker for cluster management and Grafana for real-time visualization.
Language: Jupyter Notebook - Size: 5.66 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

shalakasaraogi/apache-spark-pig-hive-work
This repository contains Apache Spark, Apache Hive, Apache Pig work
Language: PigLatin - Size: 813 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

jacopocav/spark-ifs
Iterative filter-based feature selection on large datasets with Apache Spark
Language: Scala - Size: 130 KB - Last synced at: almost 2 years ago - Pushed at: almost 7 years ago - Stars: 3 - Forks: 0

xghan99/bigdata-assignments
This repository consists of code I wrote for CS4225 - Big Data Systems for Data Science
Language: Jupyter Notebook - Size: 3.64 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

shubham-deb/Neural-Circuit-Tracer
This repository contains the source codes & scripts of my project for Master's level course - CS6240 Parallel Data Processing in Map-Reduce course at College of Computer & Information Science, Northeastern University, Boston MA.
Language: Scala - Size: 663 KB - Last synced at: about 1 year ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 0

shubham-deb/Spark_Scala_Programs
This repository contains all the Spark Scala programs that I have implemented during my Master's level course - CS6240 Parallel Data Processing in Map-Reduce course at College of Computer & Information Science, Northeastern University, Boston MA.
Language: Makefile - Size: 4 MB - Last synced at: about 1 year ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 0

esap120/spark-twitter-streaming
Streaming Twitter Sentiment Analysis with Apache Spark
Language: Scala - Size: 8.79 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 1

Siddharth1989/ProspectiveTopUpCustomerPrediction
Developed a model/Spark ML pipeline stream to identify potential customers that may purchase top up services in the future.
Language: Jupyter Notebook - Size: 6.17 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

huzaifakhan04/amazon-product-recommendation-system-web-application-flask-using-mongodb-pyspark-and-apache-kafka
This repository includes a web application connected to a product recommendation system developed with the comprehensive Amazon Review Data (2018) dataset, consisting of nearly 233.1 million records and occupying approximately 128 gigabytes (GB) of data storage, using MongoDB, PySpark, and Apache Kafka.
Language: Jupyter Notebook - Size: 91 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

ging/fiware-ml-supermarket
Demo: Predicting purchase volume in a supermarket using FIWARE
Language: JavaScript - Size: 290 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 1

surajsrivathsa/Supervised_Link_Prediction_Using_Spark_and_Neo4j
A project which involves analysis of Authorship graph data from Microsoft academic graph. In this project we calculate different graph features using temporal parameters of the authors and tried different classifiers. The final aim is to predict the link or coauthorsip possibility between two authors based on topological graph features and also find out the feasibility of performing this task on Neo4j and Spark
Language: Scala - Size: 15.8 MB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 5 - Forks: 4

MHassaanButt/Flight-Delays-Prediction
In this project, I used Decision Tree Learning Model as the main algorithm to build the model. Due to the big amount of flight data, we implement the project using MRJob, PySpark and Spark's MLlib then compare the performance and accuracy of those implementations.
Language: Jupyter Notebook - Size: 17.5 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 8 - Forks: 0

avaibh/Twitter-Bot-Detection
Big Data Stack: Spark, Kafka, Elasticsearch and NoSQL
Language: Jupyter Notebook - Size: 304 KB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 4 - Forks: 0

anmolmore/Enzyme-Classifier-Using-ML
Classify enzymes with geomic sequence using spark-ML
Language: Jupyter Notebook - Size: 719 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

svngoku/Pyspark-pour-les-datas-engineers
Introduction à Pyspark pour les Data Engineers par la pratique
Language: Jupyter Notebook - Size: 784 KB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

bassrehab/zerofish-imaging
Using the Thunder Library for Image Processing with Spark ML Lib
Language: Python - Size: 1.83 MB - Last synced at: almost 2 years ago - Pushed at: about 8 years ago - Stars: 1 - Forks: 0

grishenkovp/apache_spark
Изучение Apache Spark. Библиотека PySpark
Language: Jupyter Notebook - Size: 135 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

satyajeetmaharana/floodprediction
The goal of this project is to identify the flood-prone areas with probabilities of flood in counties in a future date, using Spark MLLib.
Language: Scala - Size: 3.46 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 1

forons/BigDataExamples
Code repository for the MSc course "Big Data and Social Networks" of the University of Trento
Language: Jupyter Notebook - Size: 229 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 13

amanjeetsahu/Apache-Spark-Tutorials
This repo contains my learnings and practice notebooks on Spark using PySpark (Python Language API on Spark). All the notebooks in the repo can be used as template code for most of the ML algorithms and can be built upon it for more complex problems.
Language: Jupyter Notebook - Size: 20.9 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 10

Rohini2505/Lending-Club-Loan-Analysis
Explanatory Data Analysis and ML model building using Apache Spark and PySpark
Language: HTML - Size: 6.26 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 10 - Forks: 12

ABigdataer/MovieRecommendSystem
基于Spark的电影推荐系统
Language: HTML - Size: 64.6 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 4

alefbt/SparkML-spring-scoring-poc 📦
POC of socring rest service od Spark ML Pipelines
Language: Java - Size: 234 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

corvaglia-alessio/big-data-labs 📦
Labs for the course "Big Data: architectures and data analytics" @ Politecnico di Torino a.y. 2021/22
Language: Java - Size: 53.4 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

chaithrakc/credit_card_default_prediction
Analyzing the likelihood of credit card delinquency without using credit scores or credit history
Language: Jupyter Notebook - Size: 4.16 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

huangyueranbbc/Spark_ALS
基于spark-ml,spark-mllib,spark-streaming的推荐算法实现
Language: Java - Size: 97.7 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 89 - Forks: 46

kocharshaivi19/Stock-Analysis-and-Prediction
Financial Forecasting and its correlation with Human Sentiments using Distributed Computing on Spark Framework
Language: Scala - Size: 811 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 4 - Forks: 4
