GitHub topics: pyspark-notebook

Repositories

bitoollearner/leetcode-pyspark

This repository is dedicated to solutions for LeetCode SQL questions implemented in PySpark.

Language: Jupyter Notebook - Size: 754 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 2

sung313/Car-Sales-End-to-End-Data-Engineering-Project-using-Azure-Databricks

Build a scalable car sales data pipeline using Azure and Databricks. Automate ingestion, ETL, and reporting for efficient data analysis. 🚗🛠️

Size: 1.26 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

jotstolu/Car-Sales-End-to-End-Data-Engineering-Project-using-Azure-Databricks

This project presents a scalable end-to-end data pipeline designed for processing and analysing car sales data using the Azure Cloud and Databricks ecosystem.

Size: 1.27 MB - Last synced at: 1 day ago - Pushed at: 23 days ago - Stars: 0 - Forks: 0

jotstolu/Retail-Orders-End-to-End-Data-Engineering-Project-using-Azure-Databricks

This project demonstrates the development of a scalable end-to-end data pipeline for processing and reporting retail order data using Azure Cloud services, Delta Lake architecture, and Databricks.

Size: 4.6 MB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 0 - Forks: 0

Soumyadipta2020/Mcdonald-vs-Burger-king

Language: Jupyter Notebook - Size: 803 KB - Last synced at: 12 days ago - Pushed at: 25 days ago - Stars: 0 - Forks: 0

gvatsal60/PySparkTemplate

A lightweight template for building PySpark applications efficiently inside devcontainer

Language: Shell - Size: 44.9 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

gvatsal60/PySparkTutorial

Comprehensive guide to mastering `PySpark` through hands-on tutorials and examples.

Language: Shell - Size: 38.1 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

sush4nt/docker-containers

References for building custom IDEs

Language: Shell - Size: 48.8 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

alfredomarinus/kafka-data-pipeline

Language: Shell - Size: 62.5 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

conorheffron/ironoc-spark

Sample pyspark Notebook

Language: Jupyter Notebook - Size: 17.6 KB - Last synced at: about 10 hours ago - Pushed at: 6 months ago - Stars: 3 - Forks: 0

hjh17/dbloy

Continuous Delivery tool for PySpark Notebooks based jobs on Databricks

Language: Python - Size: 591 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 1

josephmachado/efficient_data_processing_spark

Code for "Efficient Data Processing in Spark" Course

Language: Python - Size: 23.9 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 304 - Forks: 63

FoggySmile/BigData_ITMO

Big Data: Spark Lab and ClickHouse Lab Solutions

Language: Jupyter Notebook - Size: 144 KB - Last synced at: 2 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

microsoft/Fabric-RTA-FlightStream

Microsoft Fabric Real-time Analytics flight streaming

Language: Jupyter Notebook - Size: 1.04 MB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 21 - Forks: 3

EchoSingh/pySpark_movie_analysis

This project analyzes the MovieLens 20M dataset using PySpark, with interactive visualizations provided by Streamlit. Additionally, a Kaggle notebook offers more insights into the analysis.

Language: Python - Size: 638 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

prakashpandey16/olympic_data_engineering_project

A data analytics project analyzing Olympic data to extract insights, trends, and performance metrics using Python, PySpark, and visualization tools in Databricks.

Language: Jupyter Notebook - Size: 157 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

josephmachado/docker_for_data_engineers

Code for blog at: https://www.startdataengineering.com/post/docker-for-de/

Language: C - Size: 561 KB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 36 - Forks: 15

jplane/pyspark-devcontainer

A simple VS Code devcontainer setup for local PySpark development

Language: Jupyter Notebook - Size: 318 KB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 50 - Forks: 28

vigneshSs-07/Pyspark-ACompleteGuide

This repo explains pyspark modules in python. Used to deal with big data more practical handson.

Language: Jupyter Notebook - Size: 1.86 MB - Last synced at: 4 months ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 3

easonlai/Samples_for_Azure_Databricks_Orientation

Samples for Azure Databricks Orientation

Language: HTML - Size: 6.78 MB - Last synced at: 3 months ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 2

Saikesana31/Netflix

Azure Data engineering project

Language: Python - Size: 1.3 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

RaviSoni804426/Pyspark-With-Python Fork of krishnaik06/Pyspark-With-Python

This repository contains tutorials and examples for working with PySpark, covering data processing, transformations, machine learning, and more.

Language: Jupyter Notebook - Size: 40 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

alisonpezzott/calendario_fabric_lakehouse

Tabela calendário para lakehouse Fabric a partir do notebook spark

Language: Python - Size: 69.3 KB - Last synced at: 4 months ago - Pushed at: 8 months ago - Stars: 5 - Forks: 3

archivesunleashed/notebooks

Various examples of notebooks for working with web archives with the Archives Unleashed Toolkit, and derivatives generated by the Archives Unleashed Toolkit.

Language: Jupyter Notebook - Size: 49.1 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 26 - Forks: 4

VandanaBhumireddygari/Open-Table-Formats-with-Databricks-and-Delta-Lake

This project demonstrates the use of Open Table Formats with Databricks, PySpark, and Delta Lake. It covers data ingestion, transformation, querying, and storage management using Delta tables. The project includes code for loading data, writing it to Delta format, querying, and utilizing Delta Lake

Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

imsanjoykb/PySpark-Bootcamp

My Practice and project on PySpark

Language: Jupyter Notebook - Size: 4.52 MB - Last synced at: 22 days ago - Pushed at: almost 4 years ago - Stars: 8 - Forks: 3

s1ddh-rth/fake-kafka

A simulated Kafka data pipeline that generates fake customer and order data, processes it through Kafka, and stores it in PostgreSQL for real-time analysis with PySpark. Includes Kafdrop UI for monitoring. 🚀

Language: Python - Size: 4.88 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

SnehaDharne/BigDataAnalytics-MVCollisions

Leveraging NYC Open Data, this repository contains Databricks notebooks for analyzing motor vehicle collisions. We perform EDA, spatial clustering, and predictive modeling on collision, vehicle, and person datasets to understand accident trends and predict potential risks.

Language: Jupyter Notebook - Size: 7.64 MB - Last synced at: 5 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

mananabbasi/Data-Science-Complete-Project-using-Big-Data-Tools-Techniques-

This repository contains Databricks projects utilizing RDDs, DataFrames, and SQL to process and analyze various real-world datasets. Data cleaning and analysis have been performed using PySpark functions to handle challenges such as inconsistent formats, missing values, and complex data structures. The project ensures efficient data transformation

Language: HTML - Size: 3.71 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

AnandaRauf/CekatanBiz

CekatanBiz is Software Tools Data Analyst,Business Analyst,and Business Intelligence. Developed using Python.

Language: Jupyter Notebook - Size: 1.28 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 1

Bayunova28/BikeStore_DWH_Analytics

This repository contains about data analytics & data warehouse project from bike store

Language: Jupyter Notebook - Size: 1.31 MB - Last synced at: 4 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

TahirZia-1/EDA-Netflix-Dataset-using-PySpark-on-Docker

This project demonstrates how to perform Exploratory Data Analysis (EDA) on the Netflix dataset using PySpark in a Jupyter Notebook environment. It involves setting up Spark, loading a dataset, performing basic data cleaning, and visualizing the results. All of it is runnning on a container in Docker.

Language: Jupyter Notebook - Size: 1.75 MB - Last synced at: 19 days ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

TravelXML/APACHE-SPARK-PYSPARK-DATABRICKS

APACHE SPARK: Data Analysis, Transformation, and Visualisation with PySpark, IPL Data Analysis

Language: Jupyter Notebook - Size: 2.25 MB - Last synced at: 3 months ago - Pushed at: 12 months ago - Stars: 1 - Forks: 1

miquido/DataScience

Useful scripts and notebooks for Data Science. The project was made by Miquido. https://www.miquido.com/

Language: Jupyter Notebook - Size: 130 KB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 9 - Forks: 3

rlilojr/Detecting-Malicious-URL-Machine-Learning

Language: Jupyter Notebook - Size: 53.1 MB - Last synced at: about 11 hours ago - Pushed at: about 7 years ago - Stars: 55 - Forks: 21

Rifat392000/BigDataAnalytics

Language: Jupyter Notebook - Size: 18.4 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

SAI-MOHAN-B/Spark-Structured-Streaming

This repo is for the Structured Streaming and Projects

Language: Python - Size: 13.7 KB - Last synced at: 4 days ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

citysiva180/databricks_practice_repo

This repo is built to learn and practice databricks and PySpark. This is the practice repo for databricks Data Engineering Associate Certification

Language: Python - Size: 523 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

kyaiooiayk/pySpark-Notes

Notes, tutorials, code snippets and templates focused on PySpark for Machine Learning

Language: Jupyter Notebook - Size: 342 KB - Last synced at: 8 months ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 1

Non-NeutralZero/spark247-jupyter-dockerized

spark247-jupyter-dockerized

Language: Python - Size: 17.6 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

arturogonzalezm/energy_price_and_demand_forecast

AEMO Aggregated price and demand data

Language: Python - Size: 14.1 MB - Last synced at: 2 days ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

twseptian/Coastal-and-Offshore-Marine-Zones

Spatial Database Final Project - Coastal and Offshore Marine Zones with Geopandas and Pyspark

Language: Jupyter Notebook - Size: 45.2 MB - Last synced at: 3 months ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 1

ganeshkavhar/DataFrame-Data-Generator-by-ganesh-kavhar

Small code practices to generate a good dummy dataframes for PySpark Practices

Language: Jupyter Notebook - Size: 5.86 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

koirand/spark-notebook-on-k8s-example

Sample to run PySpark on Kubernetes cluster.

Language: Jupyter Notebook - Size: 13.7 KB - Last synced at: 3 months ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

hadleyrose/Hamlet-PySpark

Language: Jupyter Notebook - Size: 6.84 KB - Last synced at: about 1 year ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

unnitin/pyspark-jupyter-kernel

Installation instructions for pyspark and a kernel with jupyter

Language: Shell - Size: 18.6 KB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 1

choang94/yelp-reviews

Loading Yelp Reviews Data from Kaggle to a Spark Cluster provisioned on AWS EMR and doing analyses

Language: Jupyter Notebook - Size: 1.85 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

HenryBao91/PySpark-Learning-Tutorial

Hadoop+PySpark大数据挖掘、处理与分析

Language: Jupyter Notebook - Size: 11.1 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

brennerh1/databricks-demos

Repository of notebooks and related collateral used in the Databricks Demo Hub, showing how to use Databricks, Delta Lake, MLflow, and more.

Language: Python - Size: 1.06 MB - Last synced at: 2 months ago - Pushed at: about 4 years ago - Stars: 25 - Forks: 52

kaladabrio2020/pyspark-ml-analysis-data

Analises de Dados e machine learning com o Pyspark

Language: Jupyter Notebook - Size: 1.81 MB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

FranzDiebold/advent-of-code-2021 📦

Solutions for Advent of Code 2021 in (Py)Spark

Language: Jupyter Notebook - Size: 22.5 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

naiborhujosua/Telco_Churn_Analysis

Implementing the Customer Churn Analysis in Telco Industry to improving Customer retention using Pyspark in Databricks

Size: 856 KB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

Betico1928/Talleres-ProcesamientoDeDatosAGranEscala

Exploración los principios del Procesamiento de Datos a Gran Escala con talleres de Databricks y Spark. Aprender herramientas como Pandas y PySpark para el análisis eficiente de grandes conjuntos de datos. Impartidos por John Corredor en la Pontificia Universidad Javeriana.

Language: Jupyter Notebook - Size: 203 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

manoharpalanisamy/Distributed-Keras

Research And Development on Distributed Keras with Spark

Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 0

rsantos2032/Cardiovascular-Disease-Detection

Cardiovascular Disease Detection using PySpark

Language: Jupyter Notebook - Size: 1.09 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

polarbeargo/Data-Engineering-Capstone-Project

Language: Jupyter Notebook - Size: 834 KB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

aakinlalu/Crime-Classification-using-PySpark

classify crime into different categories using PySpark

Language: Jupyter Notebook - Size: 311 KB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 19 - Forks: 14

yennanliu/analysis

Repo for practical data science problems approaches, including notebook demo and working scripts | #DS | #analysis

Language: Jupyter Notebook - Size: 170 MB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 12 - Forks: 10

dimdasci/yp11-pyspark-training

Training project with Spark DataFrame and MLlib

Language: Jupyter Notebook - Size: 765 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

hyunjoonbok/PySpark

PySpark functions and utilities with examples. Assists ETL process of data modeling

Language: Jupyter Notebook - Size: 3.79 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 89 - Forks: 73

rehman04/BigData_pyspark_AWS-EC2-

Language: Jupyter Notebook - Size: 2.93 KB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

matthieuvion/spark-cluster

Steps to deploy a local spark cluster w/ Docker. Bonus: a ready-to-use notebook for model prediction on Pyspark using spark.ml Pipeline() on a well known dataset

Language: Jupyter Notebook - Size: 628 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

lmriccardo/fraudolent-transaction-classification

Project for the Big Data Computing course at the University of "La Sapienza" in Master in Computer Science A.A. 2021/2022

Language: Jupyter Notebook - Size: 15.8 MB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 7 - Forks: 1

ganeshkavhar/PySpark-GroupBy

Learn GroupBy in PySpark

Language: Jupyter Notebook - Size: 4.88 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

j-i-l/ReviewedGrapes

ML models predicting wine varieties based on a wine review texts

Language: Jupyter Notebook - Size: 2.92 MB - Last synced at: about 2 months ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 0

Prajwal10031999/Song-Genre-Classification-in-PySparks-MLlib

A PySpark MLlib classification model to classify songs based on a number of characteristics into a set of 23 electronic genres.

Language: Jupyter Notebook - Size: 1.56 MB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 6 - Forks: 2

RaghulKrish1798/PySpark_Intro

Learning PySpark Fundamentals

Language: Jupyter Notebook - Size: 13.7 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

hyeonsangjeon/dataplatform

Hadoop3.2 single/cluster mode with web terminal gotty, spark, jupyter pyspark, hive, eco etc.

Language: Shell - Size: 549 KB - Last synced at: 3 months ago - Pushed at: over 5 years ago - Stars: 11 - Forks: 1

rantoncuadrado/udacity_capstone_project

Udacity Data Engineering Nanodegree. Capstone Project.

Language: Jupyter Notebook - Size: 17.7 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

panashematsaudza/Ecommerce-Simple-Linear-Regression-

PySpark Ecommerce Simple Linear Regression

Language: Jupyter Notebook - Size: 51.8 KB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

saikaryekar/PySpark-Plane-Dataset-Exploration

Explored a dataset of planes while learning PySpark commands.

Language: Jupyter Notebook - Size: 24.4 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Shashi42/Azure-End-to-End-Sales-Data-Analytics-Pipeline

This project builds an End-to-End Azure Data Engineering Pipeline, performing ETL and Analytics Reporting on the AdventureWorks2022LT Database.

Language: Jupyter Notebook - Size: 501 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

joeliang0520/CryptoTweets

Text Classification and Data Anaylsis on Cryptocurrency Related Tweets in PySpark Enviorment

Language: Jupyter Notebook - Size: 8.95 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

Akash8K/Stocks-Data-Analysis-In-DataBricks

Stocks Data Analysis In DataBricks - Using SQL and Pyspark

Language: HTML - Size: 1.84 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

jashshah-dev/Automating-EMR-Cluster-using-AWS-Lambda

Automate Amazon EMR clusters using Lambda for streamlined and scalable data processing workflows. Unlock the full potential of your data pipeline with LambdaEMR Automator.

Language: Python - Size: 8.79 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Non-NeutralZero/pyspark-jupyter-env

Language: Shell - Size: 5.86 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

gupta-aayushkr/F1-Racing

The project aims to process Formula 1 racing data, create an automated data pipeline, and make the data available for presentation and analysis purposes.

Language: Python - Size: 5.04 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

aashokvardhan/Analyzing-Neuroimaging-Data-with-PySpark-and-Thunder

Language: Jupyter Notebook - Size: 3.84 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

aashokvardhan/Predicting-Forest-Cover-with-Decision-Trees

Language: Jupyter Notebook - Size: 10.5 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

ianjeffries/car-accident-analysis

Analyzing car accidents in the United Kingdom using PySpark and Python for big data processing.

Language: Jupyter Notebook - Size: 11 MB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 2 - Forks: 3

manishghop/CS651-UW-Project

CS651 Final Project

Language: Jupyter Notebook - Size: 1.33 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

nguyenminhduc9988/eurecom_aml

Eurecom Advanced Machine Learning course work

Language: Jupyter Notebook - Size: 3.4 MB - Last synced at: over 1 year ago - Pushed at: about 8 years ago - Stars: 1 - Forks: 0

simao-af/Microsoft-Malware-Prediction

Predict the probability of a Windows device being infected by malware based on different properties of that device.

Language: Jupyter Notebook - Size: 17.8 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

quadrantofsola/PySpark_Dataframes

Analysis of Clinical Trial Dataset using Dataframes on PySpark

Size: 2.93 KB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

RosarioB/spark-streaming-kafka

Exploring Spark Structured Streaming features by making use of Jupiter notebooks, Pyspark and interacting with a Kafka cluster.

Size: 130 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

bademiya21/Identifying-Commuter-Travel-Patterns-In-Bus-Services

A project I did with Land Transport Authority, a statutory board, whose main role is to manage the transportation infra of Singapore which includes public transport like bus and trains. The agency was interested to understand how the bus services were being utilized by commuters during peak hours and if interventions could be introduced to further enhance commuter experience on bus services e.g. shorter waiting time, faster trips with skipping of bus stops etc. This required understanding archetypes of travel patterns by commuters in bus services. This project is an extension of what was previously done here: https://blog.data.gov.sg/fingerprint-of-a-bus-route-73e5be53dcf0

Language: Jupyter Notebook - Size: 10.4 MB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 1

heischichou/Sample-CDM-Tagger

A simple tool to compare new data to historical records. It will tag rows accordingly as duplicate or NULL. The team of interns I was in designed this tool using PySpark and Jupyter Notebook in Microsoft Fabric as a practice exercise within Lexmark Research and Development Corporation's Digital Transformation program.

Language: Python - Size: 4.88 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Jayveersinh-Raj/trip_duration_big_data

Taxi trip duration forecasting using Big data and spark ML

Language: Jupyter Notebook - Size: 203 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

luuisotorres/Kaggle-Titanic-Machine-Learning-Competition-with-PySpark

This notebook is my first attempt at using PySpark for EDA and Machine Learning models.

Language: Jupyter Notebook - Size: 25.4 KB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

RiccardoRobb/BigData_project

Tweet sentiment analysis

Language: Jupyter Notebook - Size: 92.6 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

benjbaron/GeoNames

GeoNames cities search service powered by Algolia

Language: Jupyter Notebook - Size: 16.6 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 5 - Forks: 5

norbertolimonjr/KMeans-Clustering-Segmentation-Analysis

Online Retail Cassification for Marketing Segmentation Project using KMeans Clustering, Elbow Method and Silhouette Method for Validation

Language: Jupyter Notebook - Size: 53.4 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

ghanmi-hamza/Machine-learning-with-PySpark

This notebook contains the usage of Pyspark to build machine learning classifiers (note that almost ml_algorithm supported by Pyspark are used in this notebook)

Language: Jupyter Notebook - Size: 109 KB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

RosarioB/spark

Exercises on Apache Spark

Size: 88.9 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

jacobceles/intro-to-colab-pyspark-emr

A tutorial that helps Big Data Engineers ramp up faster by getting familiar with PySpark dataframes and functions. It also covers topics like EMR sizing, Google Colaboratory, fine-tuning PySpark jobs, and much more.

Language: Jupyter Notebook - Size: 438 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 13 - Forks: 7

PeterSchuld/Sparkify

Capstone Project in the Udacity Data Scientist Nanodegree program. We manipulate large and realistic datasets with Spark to engineer relevant features for predicting churn. We'll learn how to use Spark MLlib to build machine learning models with large datasets, far beyond what could be done with non-distributed technologies like scikit-learn.

Language: HTML - Size: 2.44 MB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

Related Keywords

pyspark-notebook 188 pyspark 113 spark 48 python 42 machine-learning 26 python3 24 pyspark-mllib 21 jupyter-notebook 19 pyspark-tutorial 17 databricks 16 data-science 15 databricks-notebooks 14 pyspark-python 13 big-data 11 apache-spark 11 docker 10 spark-sql 9 spark-streaming 8 sql 8 pyspark-machine-learning 7 data-analysis 7 azure 6 aws 6 bigdata 6 hadoop 6 kafka 5 big-data-analytics 5 data-visualization 5 pandas 5 hadoop-mapreduce 5 data-engineering 5 docker-compose 5 etl 5 logistic-regression 5 clustering 4 dataframe 4 mllib 4 jupyter-notebooks 4 docker-image 4 matplotlib 4 delta-lake 4 azuredatafactory 4 azuredatabricks 4 kaggle 4 azure-storage 3 jupyter 3 python-3 3 sparkml 3 scala 3 keras 3 etl-pipeline 3 azure-databricks 3 unitycatalog 3 notebook 3 aws-s3 3 spark-cluster 3 colab-notebook 3 rdd 3 azuresqldb 3 data-analytics 3 azurecloud 3 azuredatalakegen2 3 spark-ml 3 nlp-machine-learning 2 kaggle-competition 2 h2oai 2 keras-tensorflow 2 random-forest 2 feature-engineering 2 pgadmin4 2 pyspark-ml 2 dataengineering 2 sql-server 2 virtual-machine 2 data-analysis-python 2 naive-bayes-classifier 2 big-data-processing 2 hive 2 pyspark-api 2 postgresql 2 data-wrangling 2 eda 2 azure-data-factory 2 predictive-modeling 2 visualization 2 datacleaning 2 ganeshkavharpythontutorials 2 powerbi 2 lakehouse 2 spark-structured-streaming 2 emr-cluster 2 bigdataanalytics 2 s3-bucket 2 kubernetes 2 data-cleaning 2 superset 2 minio 2 hadoop-cluster 2 pytorch 2 devcontainer 2