An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: sparkcontext

neha-dev-dot/Pyspark-Tutorial

This repository is part of my journey to learn **PySpark**, the Python API for Apache Spark. I explored the fundamentals of distributed data processing using Spark and practiced with real-world data transformation and querying use cases.

Language: Jupyter Notebook - Size: 230 KB - Last synced at: 5 days ago - Pushed at: 19 days ago - Stars: 1 - Forks: 0

xavierguihot/spark_helper

A bunch of low-level basic methods for data processing and monitoring with Scala Spark

Language: Scala - Size: 793 KB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 8 - Forks: 5

NashTech-Labs/spark-streaming-gnip

An Apache Spark utility for pulling Tweets from Gnip's PowerTrack in realtime

Language: Scala - Size: 130 KB - Last synced at: 3 months ago - Pushed at: almost 10 years ago - Stars: 6 - Forks: 2

jpradas1/Big-Data_processing_analysis

This repository contains the topics that were taught in the coursera course Big Data: procesamiento y análisis. This course focus on machine learning methods applied with Spark, moreover it implement Hadoop as the tool to create the database.

Language: Jupyter Notebook - Size: 49.8 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

curusarn/spark-context-with

Python guard/wrapper for SparkContext from pyspark - allows you to use python `with` operator with SparkContext

Language: Python - Size: 1.95 KB - Last synced at: 3 months ago - Pushed at: over 8 years ago - Stars: 0 - Forks: 0