Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: spark-dataframes
mahmoudparsian/big-data-mapreduce-course
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Language: HTML - Size: 549 MB - Last synced: 18 days ago - Pushed: 18 days ago - Stars: 146 - Forks: 142
26hzhang/StockPrediction
Plain Stock Close-Price Prediction via Graves LSTM RNNs
Language: Java - Size: 41.3 MB - Last synced: 2 days ago - Pushed: over 3 years ago - Stars: 190 - Forks: 115
milesgranger/pontem
Treat Spark like pandas.
Language: Python - Size: 33.2 KB - Last synced: about 1 month ago - Pushed: almost 7 years ago - Stars: 0 - Forks: 0
mahmoudparsian/pyspark-tutorial
PySpark-Tutorial provides basic algorithms using PySpark
Language: Jupyter Notebook - Size: 8.97 MB - Last synced: 3 months ago - Pushed: over 1 year ago - Stars: 1,092 - Forks: 449
jubins/Spark-And-MLlib-Projects
This repository contains Spark, MLlib, PySpark and Dataframes projects
Language: Jupyter Notebook - Size: 101 KB - Last synced: 3 months ago - Pushed: over 6 years ago - Stars: 39 - Forks: 97
WazirRohiman/Apache_Spark_Basics
This series explores the basics of Apache Spark with the application of some practical elements of Spark, PySpark & SparkSQL
Language: Jupyter Notebook - Size: 29.3 KB - Last synced: 6 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
anshul1004/MutualFriends
Implementation of Hadoop and Spark
Language: Java - Size: 23 MB - Last synced: 6 months ago - Pushed: about 4 years ago - Stars: 1 - Forks: 0
yennanliu/spark-etl-pipeline
Various data stream/batch process demo with Apache Scala Spark π
Language: Scala - Size: 5.06 MB - Last synced: about 2 months ago - Pushed: over 4 years ago - Stars: 11 - Forks: 8
on2e/ntua-atdb
Advanced Topics in Databases course project - NTUA ECE - 2022-23
Language: Python - Size: 24.4 KB - Last synced: 8 months ago - Pushed: about 1 year ago - Stars: 1 - Forks: 0
the-timoye/spark-examples
Language: Python - Size: 14.6 KB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 0 - Forks: 2
LucasDLee/CMPT-353-Final-Project
This is our final project for SFU's CMPT 353 taught by Greg Baker during Summer 2023
Language: Python - Size: 289 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 0
NashTech-Labs/Sparkathon
A library having Java and Scala examples for Spark 2.x
Language: Java - Size: 113 MB - Last synced: 8 months ago - Pushed: over 7 years ago - Stars: 7 - Forks: 9
teamclairvoyant/data-scalaxy-reader-text
Library to read data from various string formats (CSV/JSON/XML) text and parse it to spark dataframe.
Language: Scala - Size: 101 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 0
MaxineXiong/Item-based-collaborative-filtering
This project utilizes PySpark DataFrames and PySpark RDD to implement item-based collaborative filtering. By calculating cosine similarity scores or identifying movies with the highest number of shared viewers, the system recommends 10 similar movies for a given target movie that aligns usersβ preferences.
Language: Jupyter Notebook - Size: 8.41 MB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 0
Thomas-George-T/Movies-Analytics-in-Spark-and-Scala
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
Language: Scala - Size: 11.3 MB - Last synced: over 1 year ago - Pushed: about 3 years ago - Stars: 63 - Forks: 46
Vivek-Murali/CarCrashAnalysis
BCG GAMMA CASE STUDY
Language: Jupyter Notebook - Size: 11.9 MB - Last synced: over 1 year ago - Pushed: over 1 year ago - Stars: 2 - Forks: 0
thenickben/SplitCSV-Spark
Big Data - Split a large CSV file into N smaller ones and save them into the local disk
Language: Scala - Size: 2.93 KB - Last synced: over 1 year ago - Pushed: over 5 years ago - Stars: 2 - Forks: 0
rajeshsantha/MonitoredStructuredStreaming
Repository for Spark structured streaming use case implementations.
Language: Scala - Size: 65.4 KB - Last synced: about 1 year ago - Pushed: about 4 years ago - Stars: 1 - Forks: 1
zaha2020/Big_Data
This repository contains the implementation of a wide variety of BigData Projects in different applications of NoSQL databases, Spark, Data Pipelines, and map-reduce. These projects include university projects and projects implemented due to interest in BigData.
Language: Jupyter Notebook - Size: 34.4 MB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 0 - Forks: 0
spider-123-eng/Spark
Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .
Language: Scala - Size: 6.59 MB - Last synced: over 1 year ago - Pushed: over 1 year ago - Stars: 55 - Forks: 42
Bcromas/pyspark_projects
A collection of small projects exploring PySpark features and functionality including packages and modules, algorithms, and general data science techniques.
Language: Jupyter Notebook - Size: 80.1 KB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0
andrefnb/big_data_processing_natural_disasters
Map reduce / Spark / Dataframes queries for natural disaster dataset.
Language: Jupyter Notebook - Size: 138 KB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0
knoldus/spark-dataframes-meetup
Language: Scala - Size: 19.5 KB - Last synced: about 1 year ago - Pushed: about 8 years ago - Stars: 2 - Forks: 0
spu-bigdataanalytics-193/assignment5
Spark Even More! (Bonus)
Size: 6.84 KB - Last synced: over 1 year ago - Pushed: over 4 years ago - Stars: 0 - Forks: 0
SevakAvet/spark-session-enricher
Calculate user sessions & stats on top of them for imaginary ecom site using Spark sql & aggregations
Language: Scala - Size: 10.7 KB - Last synced: about 1 year ago - Pushed: over 4 years ago - Stars: 1 - Forks: 0
mayankrawat/CSVJoin
Use this project to join data from multiple csv files. Currently in this project we support one to one and one to many join. Along with this you can find how to use kafka producer efficiently with spark.
Language: Java - Size: 10.7 KB - Last synced: over 1 year ago - Pushed: almost 2 years ago - Stars: 2 - Forks: 0