An open API service providing repository metadata for many open source software ecosystems.

Topic: "spark-mllib"

LuckyZXL2016/Movie_Recommend

基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统

Language: Java - Size: 55.1 MB - Last synced at: about 12 hours ago - Pushed at: about 6 years ago - Stars: 2,936 - Forks: 1,049

databricks/LearningSparkV2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

Language: Scala - Size: 75.2 MB - Last synced at: 1 day ago - Pushed at: 4 months ago - Stars: 1,297 - Forks: 770

wzhe06/SparkCTR

CTR prediction model based on spark(LR, GBDT, DNN)

Language: Scala - Size: 35 MB - Last synced at: 6 days ago - Pushed at: about 5 years ago - Stars: 914 - Forks: 260

qubole/sparklens

Qubole Sparklens tool for performance tuning Apache Spark

Language: Scala - Size: 175 KB - Last synced at: 6 days ago - Pushed at: 11 months ago - Stars: 574 - Forks: 141

derrickburns/generalized-kmeans-clustering

Spark library for generalized K-Means clustering. Supports general Bregman divergences. Suitable for clustering probabilistic data, time series data, high dimensional data, and very large data.

Language: HTML - Size: 7.42 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 300 - Forks: 50

jaceklaskowski/spark-workshop

Apache Spark™ and Scala Workshops

Language: HTML - Size: 57 MB - Last synced at: 5 days ago - Pushed at: 10 months ago - Stars: 264 - Forks: 148

P7h/Spark-MLlib-Twitter-Sentiment-Analysis

:star2: :sparkles: Analyze and visualize Twitter Sentiment on a world map using Spark MLlib

Language: Scala - Size: 19.7 MB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 135 - Forks: 69

jingpeicomp/product-category-predict

商品类目预测,使用 Spring Boot 开发框架和 Spark MLlib 机器学习框架,通过 TF-IDF 和 Bayes 算法,训练出一个商品类目预测模型。该模型可以根据商品名称自动预测出商品类目。项目对外提供 RESTFul 接口。

Language: Java - Size: 41.2 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 119 - Forks: 60

harishpuvvada/BitCoin-Value-Predictor 📦

[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin

Language: Jupyter Notebook - Size: 3.07 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 113 - Forks: 29

huangyueranbbc/Spark_ALS

基于spark-ml,spark-mllib,spark-streaming的推荐算法实现

Language: Java - Size: 97.7 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 89 - Forks: 46

vaslnk/Spotify-Song-Recommendation-ML

UC Berkeley team's submission for RecSys Challenge 2018

Language: Jupyter Notebook - Size: 34.2 MB - Last synced at: about 1 year ago - Pushed at: about 7 years ago - Stars: 86 - Forks: 22

alessandrolulli/reforest

Random Forests in Apache Spark

Language: Scala - Size: 71 MB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 72 - Forks: 11

OrvilleX/MachineLearning

本项目以应用为主出发,结合了从基础的机器学习、深度学习到目标检测以及目前最新的大模型,采用目前成熟的 第三方库、开源预训练模型以及相关论文的最新技术,目的是记录学习的过程同时也进行分享以供更多人可以直接进行使用。

Language: Jupyter Notebook - Size: 11.4 MB - Last synced at: 3 days ago - Pushed at: 3 months ago - Stars: 67 - Forks: 22

omerbsezer/SparkDeepMlpGADow30 📦

A Deep Neural-Network based (Deep MLP) Stock Trading System based on Evolutionary (Genetic Algorithm) Optimized Technical Analysis Parameters (using Apache Spark MLlib)

Language: Java - Size: 213 MB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 58 - Forks: 46

uosdmlab/spark-nkp

Natural Korean Processor for Apache Spark

Language: Scala - Size: 53.7 KB - Last synced at: 6 months ago - Pushed at: about 7 years ago - Stars: 53 - Forks: 16

wikistat/AI-Frameworks

Science des Données Saison 5: Technologies pour l'apprentissage automatique / statistique de données massives et l'Intelligence Artificielle

Language: Jupyter Notebook - Size: 646 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 44 - Forks: 42

omerbsezer/SparkMlpDow30 📦

A new stock trading and prediction model based on a MLP neural network utilizing technical analysis indicator values as features (using Apache Spark MLlib)

Language: Java - Size: 201 MB - Last synced at: 24 days ago - Pushed at: about 7 years ago - Stars: 36 - Forks: 17

wengbenjue/spark_recomend

使用Spark的MLlib、Hbase作为模型、Hive作数据清洗的核心推荐引擎,在Spark on Yarn测试通过

Language: C - Size: 41.2 MB - Last synced at: about 1 year ago - Pushed at: about 8 years ago - Stars: 29 - Forks: 17

jingpeicomp/product-relation-mining

商品关联关系挖掘,使用Spring Boot开发框架和Spark MLlib机器学习框架,通过FP-Growth算法,分析用户的购物车商品数据,挖掘商品之间的关联关系。项目对外提供RESTFul接口。

Language: Java - Size: 68.4 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 24 - Forks: 16

FlorentF9/sparkml-som

:sparkles: Spark ML implementation of SOM algorithm (Kohonen self-organizing map)

Language: Scala - Size: 29.3 KB - Last synced at: 16 days ago - Pushed at: over 3 years ago - Stars: 18 - Forks: 6

IBM/icp4d-customer-churn-classifier

Infuse AI into your application. Create and deploy a customer churn prediction model with IBM Cloud Private for Data, Db2 Warehouse, Spark MLlib, and Jupyter notebooks.

Language: Jupyter Notebook - Size: 28.1 MB - Last synced at: 16 days ago - Pushed at: about 2 years ago - Stars: 17 - Forks: 22

yennanliu/NYC_Taxi_Trip_Duration

Develop ML models predict taxi trip duration in NYC. Ranked : Top 6% | RMSLE : 0.377 (Kaggle) | #DS

Language: Jupyter Notebook - Size: 43.2 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 15 - Forks: 8

akashsethi24/Machine-Learning

Examples of all Machine Learning Algorithm in Apache Spark

Language: Scala - Size: 3.64 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 15 - Forks: 10

amitkumarusc/recommendation-system

A movie recommendation system trained on the MovieLens 20 Million dataset. This system makes use of Collaborative filtering methods to come up with recommendations for a particular user.

Language: Jupyter Notebook - Size: 21.8 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 13 - Forks: 3

lp-dataninja/SparkML

Detailed notes and code to learn machine learning with Apache Spark.

Language: Jupyter Notebook - Size: 4.06 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 12 - Forks: 17

giuseppegambino/Italian-Sentiment-Analysis-with-Spark

Application of Sentiment Analysis of Italian tweet with Python and Spark

Language: Python - Size: 6.84 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 10 - Forks: 0

Rohini2505/Lending-Club-Loan-Analysis

Explanatory Data Analysis and ML model building using Apache Spark and PySpark

Language: HTML - Size: 6.26 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 10 - Forks: 12

josemarialuna/ClusterIndices

This package contains the code for executing clustering validity indices in Spark. The package includes BD-Silhouette, BD-Dunn, Davies-Bouldin and WSSSE indices.

Language: Scala - Size: 588 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 10 - Forks: 3

josemarialuna/ExternalValidity

This package contains the code for calculating external clustering validity indices in Spark. The package includes Chi Index among others.

Language: Scala - Size: 146 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 9 - Forks: 1

MHassaanButt/Flight-Delays-Prediction

In this project, I used Decision Tree Learning Model as the main algorithm to build the model. Due to the big amount of flight data, we implement the project using MRJob, PySpark and Spark's MLlib then compare the performance and accuracy of those implementations.

Language: Jupyter Notebook - Size: 17.5 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 8 - Forks: 0

DavideNardone/TwitterSentimentAnalysis

A Spark Streaming implementation for Online Twitter Sentiment Analysis.

Language: Python - Size: 1.78 MB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 8 - Forks: 3

TrainingByPackt/Big-Data-Processing-with-Apache-Spark-eLearning

Efficiently tackle large datasets and perform big data analysis with Spark and Python

Language: Python - Size: 36.1 KB - Last synced at: about 1 month ago - Pushed at: over 6 years ago - Stars: 7 - Forks: 6

P7h/p7hb-docker-mllib-twitter-sentiment

:ship: Docker image for Twitter Sentiment analysis with Spark MLlib

Language: Shell - Size: 138 KB - Last synced at: about 1 year ago - Pushed at: almost 8 years ago - Stars: 7 - Forks: 3

NashTech-Labs/Sparkathon

A library having Java and Scala examples for Spark 2.x

Language: Java - Size: 113 MB - Last synced at: about 2 months ago - Pushed at: over 8 years ago - Stars: 7 - Forks: 9

cbozan/graduation-project

Graduation project categorizes popular search phrases using Python and Spark and presents them on a website to inspire creators.

Language: Python - Size: 402 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

ging/fiware-ml-supermarket

Demo: Predicting purchase volume in a supermarket using FIWARE

Language: JavaScript - Size: 290 KB - Last synced at: 15 days ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 1

alivcor/SMORK

Implementation of SMOTE - Synthetic Minority Over-sampling Technique in SparkML / MLLib

Language: Scala - Size: 165 KB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 2

surajsrivathsa/Supervised_Link_Prediction_Using_Spark_and_Neo4j

A project which involves analysis of Authorship graph data from Microsoft academic graph. In this project we calculate different graph features using temporal parameters of the authors and tried different classifiers. The final aim is to predict the link or coauthorsip possibility between two authors based on topological graph features and also find out the feasibility of performing this task on Neo4j and Spark

Language: Scala - Size: 15.8 MB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 5 - Forks: 4

tkachuksergiy/aws-spark-nlp

Works related to recent project on the use of Apache Spark and AWS cloud for NLP task.

Language: Jupyter Notebook - Size: 2.76 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 5 - Forks: 0

AttitudeAdjuster/Accident-Severity-Prediction

IBM Coursera Capstone Project - Predict Accident Severity Given Weather, Road and Lighting Conditions

Language: Jupyter Notebook - Size: 161 MB - Last synced at: 7 months ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 9

esap120/spark-twitter-streaming

Streaming Twitter Sentiment Analysis with Apache Spark

Language: Scala - Size: 8.79 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 1

giovannigarifo/bigdata

Code samples, summaries, cheatsheets and other study material for Hadoop MapReduce and Apache Spark

Language: Java - Size: 69.1 MB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 5 - Forks: 2

spoddutur/spark-ml-dashboard

Spark ML Dashboard built to plug-in and tweak the model params to real-time verify classification results on sample test data

Language: Scala - Size: 20.5 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 5 - Forks: 2

ABigdataer/MovieRecommendSystem

基于Spark的电影推荐系统

Language: HTML - Size: 64.6 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 4

MHassaanButt/Crime-Spark-ML

In this project I stream data and do crime classification using Spark. This dataset contains incidents derived from the SFPD Crime Incident Reporting system. The data ranges from 1/1/2003 to 5/13/2015. I do some data analysis of crime scenes in different areas and with respect to other parameters.

Language: Python - Size: 5.86 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 0

amanjeetsahu/Apache-Spark-Tutorials

This repo contains my learnings and practice notebooks on Spark using PySpark (Python Language API on Spark). All the notebooks in the repo can be used as template code for most of the ML algorithms and can be built upon it for more complex problems.

Language: Jupyter Notebook - Size: 20.9 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 10

IBM/db2-event-store-iot-analytics 📦

IoT sensor temperature analysis and prediction with IBM Db2 Event Store

Language: Jupyter Notebook - Size: 36.6 MB - Last synced at: 16 days ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 22

avaibh/Twitter-Bot-Detection

Big Data Stack: Spark, Kafka, Elasticsearch and NoSQL

Language: Jupyter Notebook - Size: 304 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 4 - Forks: 0

desaiankitb/spark-mllib

Apache Spark is one of the most widely used and supported open-source tools for machine learning and big data. In this repo, discover how to work with this powerful platform for machine learning. This repo discusses MLlib—the Spark machine learning library—which provides tools for data scientists and analysts who would rather find solutions to business problems than code, test, and maintain their own machine learning libraries. Repo shows how to use DataFrames to organize data structure, and covers data preparation and the most commonly used types of machine learning algorithms: clustering, classification, regression, and recommendations. You will have experience loading data into Spark, preprocessing data as needed to apply MLlib algorithms, and applying those algorithms to a variety of machine learning problems.

Language: Python - Size: 150 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 4 - Forks: 5

kocharshaivi19/Stock-Analysis-and-Prediction

Financial Forecasting and its correlation with Human Sentiments using Distributed Computing on Spark Framework

Language: Scala - Size: 811 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 4 - Forks: 4

merrillm1/Olist_Recommender_System

Recommendation engine with a .97 AUC achieved using clustering techniques to create user features. Data represents Olist marketplace transactions and was retrieved from kaggle.com.

Language: Jupyter Notebook - Size: 77.4 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 6

SainathDutkar/Fraud_Transaction_Monitor

For detecting the fraud credit card transactions at real time

Language: Scala - Size: 1000 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 3 - Forks: 4

mnassrib/pyspark-examples

This tutorial presents some examples in order to give a quick overview of the Spark APIs.

Language: Jupyter Notebook - Size: 8.48 MB - Last synced at: 12 months ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 0

satyajeetmaharana/floodprediction

The goal of this project is to identify the flood-prone areas with probabilities of flood in counties in a future date, using Spark MLLib.

Language: Scala - Size: 3.46 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 1

DerrickBu/Movie_Recommendation_Application

This is a web-based movie recommendation application written in Scala using Apache Spark and Livy.

Language: Scala - Size: 17.6 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 2

polaroidz/sales_prediction

A Production Machine Learning Pipeline for Predicting Future Sales with Spark

Language: Jupyter Notebook - Size: 90.8 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 3 - Forks: 0

jacopocav/spark-ifs

Iterative filter-based feature selection on large datasets with Apache Spark

Language: Scala - Size: 130 KB - Last synced at: almost 2 years ago - Pushed at: almost 7 years ago - Stars: 3 - Forks: 0

suyash248/recommender_system

Recommendation system using Graph DB(Neo4j), Apache Spark & Machine learning.

Language: Jupyter Notebook - Size: 17.4 MB - Last synced at: 4 months ago - Pushed at: over 7 years ago - Stars: 3 - Forks: 1

LuisFalva/ophelia

Ophelian On Mars! More than a simple framework.

Language: Python - Size: 2.16 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 2 - Forks: 5

DebanjanSarkar/pyspark-maestro

This repo contains implementations of PySpark for real-world use cases for batch data processing, streaming data processing sourced from Kafka, sockets, etc., spark optimizations, business specific bigdata processing scenario solutions, and machine learning use cases.

Language: Jupyter Notebook - Size: 66.1 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 2 - Forks: 1

grishenkovp/apache_spark

Изучение Apache Spark. Библиотека PySpark

Language: Jupyter Notebook - Size: 135 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

sunujh6/spark_practice

Language: Jupyter Notebook - Size: 1.62 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

forons/BigDataExamples

Code repository for the MSc course "Big Data and Social Networks" of the University of Trento

Language: Jupyter Notebook - Size: 229 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 13

kapilthakre/Bicycle-Sharing-Demand-Forecasting-Using-Spark-Scala

In this project, we are going to build a Bicycle sharing demand prediction service using Apache Spark and Scala. I have created a two spark application one for model generation and another for model demand prediction.

Language: Scala - Size: 295 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

tweichle/Spark-for-Big-Data

Spark: Work with Big Data and Build Machine Learning Models at Scale

Language: Jupyter Notebook - Size: 63.5 KB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 1

askmrsinh/spark-stocksim

Monte Carlo stock simulation using Apache Spark.

Language: Scala - Size: 1.81 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 2 - Forks: 2

Aveek-Saha/Cricket-score-predictor

A Big data application to predict the outcome of a T20 cricket match.

Language: Jupyter Notebook - Size: 2.17 MB - Last synced at: about 1 month ago - Pushed at: about 6 years ago - Stars: 2 - Forks: 0

joyoyoyoyoyo/emojipasta-topic-modeling

😅 A topic model of reddit.com/r/EmojiPasta trained with Spark and an LDA model (NSFW) - Trigger Warning: The r/emojipasta subreddit posts controversial content and anything I have crawled is to provide visibility of a topic modeling some of this controversial content. Unfortunately there is also discriminatory speech which must be called out!

Language: Scala - Size: 700 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 2 - Forks: 0

crazyalin92/movie_recomendation_system

Spark MLLIB: Collaborative Filtering Movie Recommendation System

Language: Scala - Size: 5.6 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 2 - Forks: 1

SayamAlt/Amazon-Products-API-ETL-and-ML-pipeline

In this project, I've created an end-to-end ETL pipeline and subsequently developed a machine learning model to predict the price of Amazon products based on several product-related features.

Language: Python - Size: 2.95 MB - Last synced at: 2 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

zikzakjack/spark-demos

Apache Spark Demos

Language: Jupyter Notebook - Size: 103 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

aliabbasi2000/Spark

Solving Big Data Problems using Spark framework in Java. Running the Project on HDFS clusters (BigData@Polito) to get the results.

Language: Java - Size: 143 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

simbafl/spark-branch-2.4

源码剖析Spark2.4

Language: Scala - Size: 17.8 MB - Last synced at: 29 days ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 1

svngoku/Pyspark-pour-les-datas-engineers

Introduction à Pyspark pour les Data Engineers par la pratique

Language: Jupyter Notebook - Size: 784 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

MostafaToema/Stroke-Prediction-using-Pyspark

Data preparation, visualization, and feature engineering and classification of people have stroke using pyspark libraries

Language: Jupyter Notebook - Size: 79.1 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

aabdel-kader/Apache-Spark

A repository for my practices and projects using pyspark

Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

szaher/spark

Playing with Spark using Java

Language: Java - Size: 424 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

Coursal/Text-Sentiment-Analysis-In-Hadoop-And-Spark

The source code developed and used for the purposes of my thesis with the same title under the guidance of my supervisor professor Vasilis Mamalis for the Department of Informatics and Computer Engineering of the University of West Attica.

Language: Java - Size: 66.5 MB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 1

angeligareta/machine-learning-spark

Assignment for Scalable Machine Learning which aims to study the basics of regression and classification in Spark.

Language: Scala - Size: 1.42 MB - Last synced at: 2 months ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

angeligareta/spark-hadoop-hbase-overview

First lab for Data-Intensive Computing course at KTH where we are introduced to Apache Spark MLlib and Spark SQL, Hadoop, and HBase.

Language: Jupyter Notebook - Size: 22.4 MB - Last synced at: 2 months ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

anmolmore/Enzyme-Classifier-Using-ML

Classify enzymes with geomic sequence using spark-ML

Language: Jupyter Notebook - Size: 719 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

shantanu-93/scalable-matrix-multiply

Fast and Scalable Matrix Multiply using spark, breeze and BLAS libraries

Language: JavaScript - Size: 127 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

jr2ngb2/yelp_recommender

Language: Jupyter Notebook - Size: 45.9 KB - Last synced at: almost 2 years ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 0

corneliouzbett/Master-Apache-Spark

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming

Language: Python - Size: 889 KB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 0

multivacplatform/multivac-nlp

Testing and benchmarking some of the existing NLP libraries in Apache Spark

Language: Scala - Size: 12 MB - Last synced at: 3 months ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 1

nahidalam/Spark

Spark, Python, AWS EMR, MLLib, Spark Streaming, Spark - SQL

Language: Jupyter Notebook - Size: 2.93 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 1

AswathKiruba/Stock_Price_Prediction

This is the CSYE7200 Big Data Systems Engineering Using Scala Final Project for Team 9 Fall 2018

Language: Scala - Size: 3.48 MB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 1

tertiarycourses/ApacheSparkTraining

Exercise files for Apache Spark Essential Training

Language: Jupyter Notebook - Size: 4.05 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 1

AndrewKuzmin/spark-ml-pipelines-with-structured-streaming-examples

Examples of using Apache Spark MLlib Pipelines and Structured Streaming on version 2.4.0

Language: Shell - Size: 1020 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

gavalle94/Songs-Recommender

Recommendation System written in Python, using the pySpark framework and other Data Science libraries

Language: HTML - Size: 5.23 MB - Last synced at: almost 2 years ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 1

lookuut/raif-competition

Spark application for prediction home and work coordinates of the customer by payment transactions

Language: Scala - Size: 27.7 MB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 0

shubham-deb/Neural-Circuit-Tracer

This repository contains the source codes & scripts of my project for Master's level course - CS6240 Parallel Data Processing in Map-Reduce course at College of Computer & Information Science, Northeastern University, Boston MA.

Language: Scala - Size: 663 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

shubham-deb/Spark_Scala_Programs

This repository contains all the Spark Scala programs that I have implemented during my Master's level course - CS6240 Parallel Data Processing in Map-Reduce course at College of Computer & Information Science, Northeastern University, Boston MA.

Language: Makefile - Size: 4 MB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

samanta-anupam/similar-water-regions

In this project we look at the global surface water explorer and find patches of areas that are similar to each other in the entire world using the European Commision Global Surface satellite water dataset

Language: Jupyter Notebook - Size: 346 KB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

abulbasar/zeppelin-notebooks

Size: 3.91 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 1

robzoros/TFP-Utad-InferenciaEtiquetas

Trabajo fin Programa Experto en Big Data: Procesar set de imágenes de MIRFLICKR en Spark TendorFlow Inception y entrenar modelo para inferencia de etiquetas. Clasificación de imágenes subidas a Twitter con Storm.

Language: Java - Size: 51.8 KB - Last synced at: 8 days ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

bassrehab/zerofish-imaging

Using the Thunder Library for Image Processing with Spark ML Lib

Language: Python - Size: 1.83 MB - Last synced at: almost 2 years ago - Pushed at: about 8 years ago - Stars: 1 - Forks: 0

dorianbg/cs110x-big-data-analysis-with-spark-labs

Graded lab exercises from the CS110x Big Data Analysis with Apache Spark online course on edx

Language: Jupyter Notebook - Size: 74.2 KB - Last synced at: about 2 years ago - Pushed at: over 8 years ago - Stars: 1 - Forks: 2

keanteng/wqd7007-project

Big Data Pipeline for NYC Taxi Trips

Language: Jupyter Notebook - Size: 8.47 MB - Last synced at: about 9 hours ago - Pushed at: about 10 hours ago - Stars: 0 - Forks: 0

lukilme/general-machine-learning-studies

repository for storing practices and studies in the area of machine learning

Language: Jupyter Notebook - Size: 7.81 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

Related Topics
spark 114 spark-sql 55 spark-streaming 51 machine-learning 44 scala 42 spark-ml 33 pyspark 33 apache-spark 27 python 25 big-data 21 python3 16 hadoop 12 kafka 9 hadoop-mapreduce 7 recommender-system 7 mongodb 7 sparkjava 7 java 7 data-science 7 sparksql 7 docker 7 sentiment-analysis 6 jupyter-notebook 6 big-data-analytics 6 bigdata 6 collaborative-filtering 6 spark-structured-streaming 5 kmeans-clustering 5 data-analysis 5 linear-regression 5 clustering 5 ml 5 hdfs 5 nlp 5 random-forest 4 hadoop-hdfs 4 natural-language-processing 4 data-visualization 4 feature-engineering 4 visualization 4 decision-trees 3 docker-compose 3 twitter 3 tensorflow 3 spark-nlp 3 kafka-streams 3 mapreduce 3 hive 3 supervised-learning 3 sbt 3 recommendation-system 3 spring-boot 3 elasticsearch 3 delta-lake 3 logistic-regression 3 apache-hadoop 3 pyspark-python 3 alternating-least-squares 3 structured-streaming 3 apache-kafka 3 aws-s3 3 databricks 3 kafka-consumer 3 twitter-sentiment-analysis 3 prediction 3 rdd 3 recommendation 2 kaggle 2 machine-learning-algorithms 2 classification 2 clustering-evaluation 2 data-pipeline 2 distributed-computing 2 matplotlib 2 numpy 2 pandas 2 java-8 2 jupyter 2 hadoop-framework 2 mlflow 2 twitter-api 2 stock-price-prediction 2 spark-dataframes 2 spark-core 2 classification-algorithm 2 spark-mllib-library 2 mllib 2 cassandra 2 gradient-boosted-trees 2 r 2 lda 2 spark-rdd 2 neo4j 2 decision-tree-classifier 2 als 2 spark-streaming-kafka 2 aws 2 graphx 2 naive-bayes-classification 2 topic-modeling 2