An open API service providing repository metadata for many open source software ecosystems.

Topic: "pyspark-notebook"

Shashi42/Azure-End-to-End-Sales-Data-Analytics-Pipeline

This project builds an End-to-End Azure Data Engineering Pipeline, performing ETL and Analytics Reporting on the AdventureWorks2022LT Database.

Language: Jupyter Notebook - Size: 501 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

jashshah-dev/Automating-EMR-Cluster-using-AWS-Lambda

Automate Amazon EMR clusters using Lambda for streamlined and scalable data processing workflows. Unlock the full potential of your data pipeline with LambdaEMR Automator.

Language: Python - Size: 8.79 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Non-NeutralZero/pyspark-jupyter-env

Language: Shell - Size: 5.86 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

RosarioB/spark-streaming-kafka

Exploring Spark Structured Streaming features by making use of Jupiter notebooks, Pyspark and interacting with a Kafka cluster.

Size: 130 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

heischichou/Sample-CDM-Tagger

A simple tool to compare new data to historical records. It will tag rows accordingly as duplicate or NULL. The team of interns I was in designed this tool using PySpark and Jupyter Notebook in Microsoft Fabric as a practice exercise within Lexmark Research and Development Corporation's Digital Transformation program.

Language: Python - Size: 4.88 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

RiccardoRobb/BigData_project

Tweet sentiment analysis

Language: Jupyter Notebook - Size: 92.6 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

norbertolimonjr/KMeans-Clustering-Segmentation-Analysis

Online Retail Cassification for Marketing Segmentation Project using KMeans Clustering, Elbow Method and Silhouette Method for Validation

Language: Jupyter Notebook - Size: 53.4 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

RosarioB/spark

Exercises on Apache Spark

Size: 88.9 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

burakai/pyspark-tutorial

This tutorial based on "Pyspark with Python" YouTube playlist of Krish Naik (youtube.com/@krishnaik06). The series is also published on freeCodeCamp's YouTube channel (youtube.com/@freecodecamp). Thank them all!

Language: Jupyter Notebook - Size: 13.7 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

simao-af/Microsoft-Malware-Prediction

Predict the probability of a Windows device being infected by malware based on different properties of that device.

Language: Jupyter Notebook - Size: 17.8 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

srinathsai/Docker-Application

This project aims to demonstrate Importance of Docker in enabling faster software deliver cycles by implementing Ubuntu, Pyspark as inbuilt and allowing user to run inbuilt wordcount program using Pyspark

Language: Jupyter Notebook - Size: 203 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

atullal/Exploring-the-Home-Mortgage-Market

Our goal with this dataset is to explore the Home Mortgage market within the US to identify patterns the data on the basis of gender, race, income, property type, loan type, amount and location.

Language: Jupyter Notebook - Size: 789 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Jayveersinh-Raj/trip_duration_big_data

Taxi trip duration forecasting using Big data and spark ML

Language: Jupyter Notebook - Size: 203 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

AbhimanyuW/BigData-EthereumAnalysis

A coursework on Ethereum Analysis using PySpark, as a part of curriculum at Queen Mary University of London.

Language: Jupyter Notebook - Size: 493 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

kristin-kim/gcp-dataproc_serverless-running-notebooks

Orchestrator to run Notebooks on Dataproc SERVERLESS via Cloud Composer

Language: Jupyter Notebook - Size: 204 KB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 1

RickLeite/learning-batch-processing

Learning batch processing with Pyspark Interface for Apache Spark

Language: Jupyter Notebook - Size: 33.2 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

saitzaw/data-pipeline-for-ds

Data pipeline developing for Data science

Language: Jupyter Notebook - Size: 1.62 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

dlleonardo/spark-de-ml-assignments

Spark DE&ML assignments from the "Data Engineering and Machine Learning with Spark" course (offered by IBM Skills Network)

Language: Jupyter Notebook - Size: 56.6 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

caiocmb7/python-rep

Studies about python, including basic stuffs and oop

Language: Jupyter Notebook - Size: 204 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

samuelesimone/Pyspark-fundamentals

Pyspark fundamentals

Language: Jupyter Notebook - Size: 1.95 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

behnamy2010/Pyspark-Malware-Detection-Using-Assembly-Code-and-Byte-Codes

Pyspark-Malware Detection Using Assembly Code and Byte Codes in Big 2015 Dataset

Language: Jupyter Notebook - Size: 121 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

CelineWW/Amazon_Vine_Bias_Pyspark

Using PySpark, Amazon RDS database and S3 bucket performed analysis on Amazon office product reviews. The percentage of 5 star reviews was calculated to check if there is any positivity bias with Vine reviews comparing Non-Vine reviews.

Language: Jupyter Notebook - Size: 30.5 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

dlleonardo/spark-assignments

Spark assignments from "Introduction to Big Data" course (offered by IBM Skills Network)

Language: Jupyter Notebook - Size: 28.3 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

sanogotech/pyspark-examples Fork of spark-examples/pyspark-examples

Pyspark RDD, DataFrame and Dataset Examples in Python language

Language: Python - Size: 729 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

behnamy2010/PySpark-Clustering

PySpark - Clustering with Kmeans++ and Bisecting K-means

Language: Jupyter Notebook - Size: 613 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

behnamy2010/PySpark-Word-Count

PySpark Word Count

Language: Jupyter Notebook - Size: 1.85 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Tarequzzaman/pyspark-Learning

This repo is maintain for learning pyspark

Language: Jupyter Notebook - Size: 4.88 KB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Mr-Hari-vignesh/dsplayground Fork of diggibyte/dsplayground

Language: Jupyter Notebook - Size: 12.5 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

abhinit21/data-analysis-pyspark

analyze the data set of world championship chess games using PySpark

Language: Jupyter Notebook - Size: 606 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Sanjay-dev-ds/DWBI_Sales_Prediction

Created DW for sales data source and visualization done for the relevant requirements. Sales Prediction (Time Series) is done using the DW.

Language: Jupyter Notebook - Size: 1.24 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

dimdasci/yp11-pyspark-training

Training project with Spark DataFrame and MLlib

Language: Jupyter Notebook - Size: 765 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

luuisotorres/Kaggle-Titanic-Machine-Learning-Competition-with-PySpark

This notebook is my first attempt at using PySpark for EDA and Machine Learning models.

Language: Jupyter Notebook - Size: 25.4 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

zulfiqarAlibalti/PyTorch

This repo contains PyTorch Projects from Basic to Advance

Language: Jupyter Notebook - Size: 8.79 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

HassanRehman11/FuzzyMatchingPhonemes

Develop using PySpark

Language: Jupyter Notebook - Size: 3.09 MB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

CAG9/PySpark

Language: Jupyter Notebook - Size: 28.3 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

mhaseebtariq/pyspark-helpers

Useful helper functions for PySpark dataframe operations

Language: Jupyter Notebook - Size: 94.7 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 1

quadrantofsola/PySpark_Dataframes

Analysis of Clinical Trial Dataset using Dataframes on PySpark

Size: 2.93 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

luispsalazar/Amazon_Vine_Analysis

This performs ELT on Amazon's "Musical Instruments" reviews, checking for possible bias of the paid reviewers.

Language: Jupyter Notebook - Size: 3.7 MB - Last synced at: 5 months ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

joeliang0520/CryptoTweets

Text Classification and Data Anaylsis on Cryptocurrency Related Tweets in PySpark Enviorment

Language: Jupyter Notebook - Size: 8.95 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

mdbinger/Amazon_Vine_Analysis

Analyzed reviews for Music products on Amazon written by members of the paid, Amazon Vine program looking for potential bias in the reviews. PySpark was used to extract and transform the review data, which was connected to an Amazon Web Service RDS and loaded into pfAdmin.

Language: Jupyter Notebook - Size: 512 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

jpacerqueira-zz/DeepLearning-MalwareDetection

Language: Jupyter Notebook - Size: 285 MB - Last synced at: 5 months ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 1

radityohanif/Pemberdayaan-Masyarakat-Kalimantan-Barat

UTS Mata Kuliah Praktikum Big Data

Language: Jupyter Notebook - Size: 636 KB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

JakobLS/Spotify-streams

Interactive visualisation of Spotify Streams in Europe & North America during 2017-2021.

Language: HTML - Size: 94.7 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

prakHr/Multiclass-Category-Classification

Contains notebooks that does categorical classification of shop items using embeddings in CNNs and Pyspark(Logistic Regression and MLlib)

Language: Jupyter Notebook - Size: 1.83 MB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

EilinLux/SparkCertification

notes for pyspark certification with notebooks

Language: Jupyter Notebook - Size: 301 MB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

manishghop/CS651-UW-Project

CS651 Final Project

Language: Jupyter Notebook - Size: 1.33 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

BrenoShelby/PySparkCoursePractice

Just a repository for my studies about PySpark.

Language: Jupyter Notebook - Size: 27.3 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

twseptian/apache-pyspark-programming

Big Data Python Programming using Apache Spark and Pyspark

Language: Jupyter Notebook - Size: 78.1 MB - Last synced at: 3 days ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 5

rantoncuadrado/udacity_capstone_project

Udacity Data Engineering Nanodegree. Capstone Project.

Language: Jupyter Notebook - Size: 17.7 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

easonlai/log_analytics_with_databricks

Azure Databricks notebook sample to connect Blob Storage of Azure Log Analytics

Language: HTML - Size: 48.8 KB - Last synced at: 3 months ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

itsayushthada/SVD-on-Spark

Language: Jupyter Notebook - Size: 1.72 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

naiborhujosua/Telco_Churn_Analysis

Implementing the Customer Churn Analysis in Telco Industry to improving Customer retention using Pyspark in Databricks

Size: 856 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

amalaj7/Pyspark-Notes

This repository contains the Notes for Pyspark

Language: Jupyter Notebook - Size: 1.87 MB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 2

choang94/yelp-reviews

Loading Yelp Reviews Data from Kaggle to a Spark Cluster provisioned on AWS EMR and doing analyses

Language: Jupyter Notebook - Size: 1.85 MB - Last synced at: 12 months ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

ghostcat404/pyspark_data_load

Language: Jupyter Notebook - Size: 387 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

lheuveline/steam-analysis

Steam dataset exploration and analysis

Language: Jupyter Notebook - Size: 20.5 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

carlossanchezvega/spark_works

Language: Jupyter Notebook - Size: 478 KB - Last synced at: 4 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

galib360/BigData_Project

Language: Jupyter Notebook - Size: 3.89 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

a-poor/cookiecutter-jupyter-pyspark

A cookiecutter template for a Docker/Jupyter/Data-Science/PySpark project

Language: Jupyter Notebook - Size: 3.91 KB - Last synced at: 1 day ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

airdipu/Covid19-Big-Data 📦

This is a project of COVID-19 infections in Australia and the possible infection rates prediction using Spark.

Language: Jupyter Notebook - Size: 188 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

kunalBhashkar/pySpark_examples

PySpark Tutorial

Language: Jupyter Notebook - Size: 802 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

panashematsaudza/Ecommerce-Simple-Linear-Regression-

PySpark Ecommerce Simple Linear Regression

Language: Jupyter Notebook - Size: 51.8 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

sunidhit/NYCCityTaxiDataAnalysis

Language: Python - Size: 11.7 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

hadleyrose/Hamlet-PySpark

Language: Jupyter Notebook - Size: 6.84 KB - Last synced at: 12 months ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

d-vignesh/PySpark_FireServiceCallsAnalysis

An introductory notebook exploring the functionalities of Pyspark

Language: Jupyter Notebook - Size: 5.86 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

pmbrull/pmbrull-github-io-archive 📦

Language: HTML - Size: 48.5 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

ChiAmyC0987/big-data-challenge

Machine Learning, Big Data, ETL

Size: 216 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

solvimm/glue-comprehend

Scaling sentiment analysis with AWS Glue and Amazon Comprehend.

Language: Python - Size: 12.7 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

ansjin/docker-spark

docker spark standalone

Language: Dockerfile - Size: 5.86 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 1

rehman04/BigData_pyspark_AWS-EC2-

Language: Jupyter Notebook - Size: 2.93 KB - Last synced at: about 1 year ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

gaelblanchard/anime_recommendation_engine

An anime recommendation engine that allows us to recommend anime based on a given anime title or a given user using Pyspark

Language: Python - Size: 8.79 KB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

unnitin/pyspark-jupyter-kernel

Installation instructions for pyspark and a kernel with jupyter

Language: Shell - Size: 18.6 KB - Last synced at: 12 months ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 1

Ipsedo/CriteoSpark

Projet de l'UE TC6 sur le challenge Kaggle Criteo Display Advertising

Language: Jupyter Notebook - Size: 85.9 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

prakass1/SparkProject

Usage of Apache Spark and Graphx

Language: Jupyter Notebook - Size: 1.78 MB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

vivek-bombatkar/Graph-Datastructure-for-Movielens-dataset

Language: Jupyter Notebook - Size: 726 KB - Last synced at: 3 months ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

irfanalidv/Applied_Machine_Learning_Apache_Spark

Apache® Spark™ for Machine Learning and Data Science

Language: Jupyter Notebook - Size: 261 KB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 1

riyadparvez/pyspark-datascience

PySpark notebooks

Language: Jupyter Notebook - Size: 819 KB - Last synced at: 3 months ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 0

aashokvardhan/Analyzing-Neuroimaging-Data-with-PySpark-and-Thunder

Language: Jupyter Notebook - Size: 3.84 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

aashokvardhan/Predicting-Forest-Cover-with-Decision-Trees

Language: Jupyter Notebook - Size: 10.5 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

akanshu22/Triangle-Counting-Problem-in-Apache-Spark

Implementation of Triangle Counting Problem in Apache Spark

Language: Jupyter Notebook - Size: 519 KB - Last synced at: about 2 years ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 1