An open API service providing repository metadata for many open source software ecosystems.

Topic: "spark-rdd"

mahmoudparsian/pyspark-tutorial

PySpark-Tutorial provides basic algorithms using PySpark

Language: Jupyter Notebook - Size: 8.96 MB - Last synced at: 7 days ago - Pushed at: 4 months ago - Stars: 1,219 - Forks: 476

mahmoudparsian/big-data-mapreduce-course

Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University

Language: HTML - Size: 601 MB - Last synced at: 5 days ago - Pushed at: 6 months ago - Stars: 158 - Forks: 143

Thomas-George-T/Movies-Analytics-in-Spark-and-Scala

Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.

Language: Scala - Size: 11.3 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 63 - Forks: 46

yennanliu/spark-etl-pipeline

Various data stream/batch process demo with Apache Scala Spark 🚀

Language: Scala - Size: 5.06 MB - Last synced at: about 1 year ago - Pushed at: about 5 years ago - Stars: 11 - Forks: 8

Ren294/Log-Analysis-Project

This project builds a scalable log analytics pipeline use Lambda architecture for real-time and batch processing of NASA server logs.

Language: Python - Size: 2.88 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 5 - Forks: 1

MaxineXiong/Item-based-collaborative-filtering

This project utilizes PySpark DataFrames and PySpark RDD to implement item-based collaborative filtering. By calculating cosine similarity scores or identifying movies with the highest number of shared viewers, the system recommends 10 similar movies for a given target movie that aligns users’ preferences.

Language: Jupyter Notebook - Size: 8.44 MB - Last synced at: 2 months ago - Pushed at: 11 months ago - Stars: 3 - Forks: 0

MaxineXiong/Degrees-of-Separation-with-Breadth-first-Search

This project utilizes PySpark RDD and the Breadth-first Search (BFS) algorithm to find the shortest path and degrees of separation between two given Marvel superheroes based on based on their appearances together in the same comic books, empowering users to discover connections between their favourite superheroes in the Marvel universe.

Language: Jupyter Notebook - Size: 775 KB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 2 - Forks: 0

adityajn105/Apache-Spark-Tutorials

Apache spark is a big data analysis framework.

Language: Jupyter Notebook - Size: 789 KB - Last synced at: about 1 month ago - Pushed at: about 6 years ago - Stars: 2 - Forks: 5

mohammad-safari/spark-hadoop-exercise

spark hadoop exercise of cloud computing course - aut 1402-1403 fall

Language: Jupyter Notebook - Size: 33.2 MB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

on2e/ntua-atdb

Advanced Topics in Databases course project - NTUA ECE - 2022-23

Language: Python - Size: 24.4 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

ShreeshaN/SparkBigDataTutorials

Demonstration of basic data transformations using Spark RDD and Spark DataFrame in Scala

Language: Scala - Size: 237 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

manojpawar94/Spark-Scala-Examples

I have implemented the sample programs using apache spark. The programs have developed on the concepts of Spark RDD and Spark SQL Dataframe.

Language: Scala - Size: 11.7 KB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

nikhilkumawat03/Extracting-Relevant-Document

Projects contains based on Big Data

Language: Java - Size: 18.3 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

madhurimarawat/Big-Data-Analytics

This repository demonstrates big data processing, visualization, and machine learning using tools such as Hadoop, Spark, Kafka, and Python.

Language: Jupyter Notebook - Size: 10.7 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 1

RiccardoRevalor/Spark

Spark exercises

Language: Jupyter Notebook - Size: 302 KB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

demanejar/spark-rdd

Spark RDD basic

Language: Java - Size: 1.95 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

vaibhav50596/DeerfootTrailAnalysis

The goal is to train a linear regression model to predict Deerfoot commute times given weather and accident conditions using Spark RDD and MLlib

Language: Jupyter Notebook - Size: 82 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 1

contactsunny/spring-spark-s3-file-read

A POC written in Java using the Spring framework, which uses Apache Spark to read a file from Amazon S3 FS and counts the number of lines in the file.

Language: Java - Size: 6.84 KB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 0