An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: spark-dataframes

mahmoudparsian/pyspark-tutorial

PySpark-Tutorial provides basic algorithms using PySpark

Language: Jupyter Notebook - Size: 8.96 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 1,223 - Forks: 478

mahmoudparsian/big-data-mapreduce-course

Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University

Language: HTML - Size: 601 MB - Last synced at: 9 days ago - Pushed at: 6 months ago - Stars: 158 - Forks: 143

26hzhang/StockPrediction

Plain Stock Close-Price Prediction via Graves LSTM RNNs

Language: Java - Size: 41.3 MB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 197 - Forks: 113

prajakta-3-patil/e-commerce-analysis

This project is about exploring and analysing E-commerce data. This primarily includes leveraging Apache Spark Dataframe API, joins, functions and aggregations to generate summarized results.

Language: Python - Size: 24.4 KB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

aravind2060/spark-sql-on-flight-data Fork of Cloud-Computing-Fall2024/assignment-4-advanced-spark-sql-on-flight-data

work with a flight dataset and use Spark SQL to analyze flight delays, airport traffic, and other key metrics

Language: Python - Size: 309 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

MaxineXiong/Item-based-collaborative-filtering

This project utilizes PySpark DataFrames and PySpark RDD to implement item-based collaborative filtering. By calculating cosine similarity scores or identifying movies with the highest number of shared viewers, the system recommends 10 similar movies for a given target movie that aligns users’ preferences.

Language: Jupyter Notebook - Size: 8.44 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 3 - Forks: 0

jubins/Spark-And-MLlib-Projects

This repository contains Spark, MLlib, PySpark and Dataframes projects

Language: Jupyter Notebook - Size: 101 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 39 - Forks: 97

mohammad-safari/spark-hadoop-exercise

spark hadoop exercise of cloud computing course - aut 1402-1403 fall

Language: Jupyter Notebook - Size: 33.2 MB - Last synced at: 7 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

WazirRohiman/Apache_Spark_Basics

This series explores the basics of Apache Spark with the application of some practical elements of Spark, PySpark & SparkSQL

Language: Jupyter Notebook - Size: 29.3 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

anshul1004/MutualFriends

Implementation of Hadoop and Spark

Language: Java - Size: 23 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

yennanliu/spark-etl-pipeline

Various data stream/batch process demo with Apache Scala Spark 🚀

Language: Scala - Size: 5.06 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 11 - Forks: 8

on2e/ntua-atdb

Advanced Topics in Databases course project - NTUA ECE - 2022-23

Language: Python - Size: 24.4 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

the-timoye/spark-examples

Language: Python - Size: 14.6 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 2

LucasDLee/CMPT-353-Final-Project

This is our final project for SFU's CMPT 353 taught by Greg Baker during Summer 2023

Language: Python - Size: 289 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

NashTech-Labs/Sparkathon

A library having Java and Scala examples for Spark 2.x

Language: Java - Size: 113 MB - Last synced at: 2 months ago - Pushed at: over 8 years ago - Stars: 7 - Forks: 9

teamclairvoyant/data-scalaxy-reader-text

Library to read data from various string formats (CSV/JSON/XML) text and parse it to spark dataframe.

Language: Scala - Size: 101 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Thomas-George-T/Movies-Analytics-in-Spark-and-Scala

Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.

Language: Scala - Size: 11.3 MB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 63 - Forks: 46

Vivek-Murali/CarCrashAnalysis

BCG GAMMA CASE STUDY

Language: Jupyter Notebook - Size: 11.9 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

thenickben/SplitCSV-Spark

Big Data - Split a large CSV file into N smaller ones and save them into the local disk

Language: Scala - Size: 2.93 KB - Last synced at: 5 days ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 0

rajeshsantha/MonitoredStructuredStreaming

Repository for Spark structured streaming use case implementations.

Language: Scala - Size: 65.4 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 1

zaha2020/Big_Data

This repository contains the implementation of a wide variety of BigData Projects in different applications of NoSQL databases, Spark, Data Pipelines, and map-reduce. These projects include university projects and projects implemented due to interest in BigData.

Language: Jupyter Notebook - Size: 34.4 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

spider-123-eng/Spark

Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .

Language: Scala - Size: 6.59 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 55 - Forks: 42

Bcromas/pyspark_projects

A collection of small projects exploring PySpark features and functionality including packages and modules, algorithms, and general data science techniques.

Language: Jupyter Notebook - Size: 80.1 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

andrefnb/big_data_processing_natural_disasters

Map reduce / Spark / Dataframes queries for natural disaster dataset.

Language: Jupyter Notebook - Size: 138 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

knoldus/spark-dataframes-meetup

Language: Scala - Size: 19.5 KB - Last synced at: about 2 years ago - Pushed at: about 9 years ago - Stars: 2 - Forks: 0

spu-bigdataanalytics-193/assignment5

Spark Even More! (Bonus)

Size: 6.84 KB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

SevakAvet/spark-session-enricher

Calculate user sessions & stats on top of them for imaginary ecom site using Spark sql & aggregations

Language: Scala - Size: 10.7 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

mayankrawat/CSVJoin

Use this project to join data from multiple csv files. Currently in this project we support one to one and one to many join. Along with this you can find how to use kafka producer efficiently with spark.

Language: Java - Size: 10.7 KB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

milesgranger/pontem

Treat Spark like pandas.

Language: Python - Size: 33.2 KB - Last synced at: 21 days ago - Pushed at: almost 8 years ago - Stars: 0 - Forks: 0

Related Keywords
spark-dataframes 29 spark 15 spark-sql 14 pyspark 11 apache-spark 8 big-data 7 spark-rdd 7 scala 7 python 5 spark-streaming 5 mapreduce 4 kafka 3 hadoop 3 big-data-analytics 3 dataframes 2 spark-structured-streaming 2 spark-mllib 2 spark-dataset 2 rdd 2 knoldus 2 sbt 2 kafka-producer 2 spark-ml 2 spark-kafka-integration 2 data-algorithms 2 java 2 mapreduce-python 2 apache-hadoop 2 data-transformation 2 data-engineering 2 nosql-database 1 spark-graphx 1 sql 1 stock-market 1 superset 1 cassandra-installation 1 consumer 1 algorithms 1 parquet 1 spark-aggregations-using-dataframe 1 spark-catalog-api 1 spark-datadog 1 spark-hive-context 1 neo4j 1 mongodb 1 map-reduce 1 kibana 1 kafka-streams 1 hive 1 elasticsearch 1 cypher 1 clickhouse 1 cassandra 1 spark-streaming-kafka 1 etl 1 spark-scala 1 spark-programs 1 dockerfile 1 pandas 1 distributed-dataframe 1 dataframe-api 1 spark-kafka 1 spark-java 1 spark-csv 1 one-to-one-join 1 one-to-many-joins-spark 1 one-to-many-join 1 one-to-many 1 kafka-with-spark 1 kafka-spark 1 kafka-producer-spark 1 join-apache-spark 1 integrate-kafka-spark 1 apachespark 1 sessionize 1 scala-spark 1 petproject 1 pet-project 1 ecommerce 1 eda 1 meetup 1 streaming 1 spark-with-mangodb 1 spark-use-cases 1 spark-transformations 1 spark-to-cassandra-connection 1 spark-streaming-data 1 spark-mangodb 1 spark-joins 1 spark-jdbc-connection 1 docker 1 yelp-dataset 1 social-media-mining 1 social-media-analysis 1 social-media 1 pyspark-python 1 pyspark-dataframe-format 1 mutual-friends 1 mapreduce-java 1 hadoop-mapreduce 1