Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: pyspark-notebook

HenryBao91/PySpark-Learning-Tutorial

Hadoop+PySpark大数据挖掘、处理与分析

Language: Jupyter Notebook - Size: 11.1 MB - Last synced: about 23 hours ago - Pushed: 2 days ago - Stars: 0 - Forks: 0

josephmachado/efficient_data_processing_spark

Code for "Efficient Data Processing in Spark" Course

Language: Python - Size: 23.8 MB - Last synced: 4 days ago - Pushed: 6 days ago - Stars: 44 - Forks: 10

sush4nt/docker-containers

References for building custom IDEs

Language: Shell - Size: 36.1 KB - Last synced: 9 days ago - Pushed: 9 days ago - Stars: 0 - Forks: 0

vivek-bombatkar/Graph-Datastructure-for-Movielens-dataset

Language: Jupyter Notebook - Size: 726 KB - Last synced: 9 days ago - Pushed: over 5 years ago - Stars: 0 - Forks: 0

kaladabrio2020/pyspark-ml-analysis-data

Analises de Dados e machine learning com o Pyspark

Language: Jupyter Notebook - Size: 1.95 KB - Last synced: 10 days ago - Pushed: 10 days ago - Stars: 0 - Forks: 0

josephmachado/docker_for_data_engineers

Code for blog at: https://www.startdataengineering.com/post/docker-for-de/

Language: C - Size: 561 KB - Last synced: 14 days ago - Pushed: 22 days ago - Stars: 15 - Forks: 7

FranzDiebold/advent-of-code-2021 📦

Solutions for Advent of Code 2021 in (Py)Spark

Language: Jupyter Notebook - Size: 22.5 KB - Last synced: 19 days ago - Pushed: over 2 years ago - Stars: 1 - Forks: 0

naiborhujosua/Telco_Churn_Analysis

Implementing the Customer Churn Analysis in Telco Industry to improving Customer retention using Pyspark in Databricks

Size: 856 KB - Last synced: 19 days ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0

Betico1928/Talleres-ProcesamientoDeDatosAGranEscala

Exploración los principios del Procesamiento de Datos a Gran Escala con talleres de Databricks y Spark. Aprender herramientas como Pandas y PySpark para el análisis eficiente de grandes conjuntos de datos. Impartidos por John Corredor en la Pontificia Universidad Javeriana.

Language: Jupyter Notebook - Size: 203 MB - Last synced: 22 days ago - Pushed: 22 days ago - Stars: 0 - Forks: 0

manoharpalanisamy/Distributed-Keras

Research And Development on Distributed Keras with Spark

Language: Jupyter Notebook - Size: 11.7 KB - Last synced: 23 days ago - Pushed: almost 6 years ago - Stars: 1 - Forks: 0

easonlai/Samples_for_Azure_Databricks_Orientation

Samples for Azure Databricks Orientation

Language: HTML - Size: 6.78 MB - Last synced: 23 days ago - Pushed: over 3 years ago - Stars: 4 - Forks: 2

easonlai/log_analytics_with_databricks

Azure Databricks notebook sample to connect Blob Storage of Azure Log Analytics

Language: HTML - Size: 48.8 KB - Last synced: 23 days ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0

easonlai/databricks_delta_table_samples

This is a code sample repository for demonstrating how to perform Databricks Delta Table operations.

Language: HTML - Size: 23.9 MB - Last synced: 23 days ago - Pushed: almost 2 years ago - Stars: 2 - Forks: 1

rsantos2032/Cardiovascular-Disease-Detection

Cardiovascular Disease Detection using PySpark

Language: Jupyter Notebook - Size: 1.09 MB - Last synced: 25 days ago - Pushed: 25 days ago - Stars: 0 - Forks: 0

polarbeargo/Data-Engineering-Capstone-Project

Language: Jupyter Notebook - Size: 834 KB - Last synced: 26 days ago - Pushed: over 2 years ago - Stars: 1 - Forks: 0

aakinlalu/Crime-Classification-using-PySpark

classify crime into different categories using PySpark

Language: Jupyter Notebook - Size: 311 KB - Last synced: 23 days ago - Pushed: about 5 years ago - Stars: 19 - Forks: 14

jplane/pyspark-devcontainer

A simple VS Code devcontainer setup for local PySpark development

Language: Jupyter Notebook - Size: 318 KB - Last synced: 23 days ago - Pushed: 10 months ago - Stars: 18 - Forks: 19

a-poor/cookiecutter-jupyter-pyspark

A cookiecutter template for a Docker/Jupyter/Data-Science/PySpark project

Language: Jupyter Notebook - Size: 3.91 KB - Last synced: about 1 month ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 0

yennanliu/analysis

Repo for practical data science problems approaches, including notebook demo and working scripts | #DS | #analysis

Language: Jupyter Notebook - Size: 170 MB - Last synced: about 1 month ago - Pushed: over 3 years ago - Stars: 12 - Forks: 10

dimdasci/yp11-pyspark-training

Training project with Spark DataFrame and MLlib

Language: Jupyter Notebook - Size: 765 KB - Last synced: about 1 month ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0

hyunjoonbok/PySpark

PySpark functions and utilities with examples. Assists ETL process of data modeling

Language: Jupyter Notebook - Size: 3.79 MB - Last synced: 9 days ago - Pushed: over 3 years ago - Stars: 89 - Forks: 73

rehman04/BigData_pyspark_AWS-EC2-

Language: Jupyter Notebook - Size: 2.93 KB - Last synced: about 2 months ago - Pushed: almost 5 years ago - Stars: 0 - Forks: 0

matthieuvion/spark-cluster

Steps to deploy a local spark cluster w/ Docker. Bonus: a ready-to-use notebook for model prediction on Pyspark using spark.ml Pipeline() on a well known dataset

Language: Jupyter Notebook - Size: 628 KB - Last synced: 28 days ago - Pushed: 11 months ago - Stars: 1 - Forks: 0

AnandaRauf/CekatanBiz

CekatanBiz is Software Tools Data Analyst,Business Analyst,and Business Intelligence. Developed using Python.

Language: Jupyter Notebook - Size: 1.28 MB - Last synced: about 1 hour ago - Pushed: 2 months ago - Stars: 6 - Forks: 1

ganeshkavhar/PySpark-GroupBy

Learn GroupBy in PySpark

Language: Jupyter Notebook - Size: 4.88 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0

microsoft/Fabric-RTA-FlightStream

Microsoft Fabric Real-time Analytics flight streaming

Language: Jupyter Notebook - Size: 1.04 MB - Last synced: about 1 month ago - Pushed: 3 months ago - Stars: 14 - Forks: 2

j-i-l/ReviewedGrapes

ML models predicting wine varieties based on a wine review texts

Language: Jupyter Notebook - Size: 2.92 MB - Last synced: 19 days ago - Pushed: almost 3 years ago - Stars: 2 - Forks: 0

RaghulKrish1798/PySpark_Intro

Learning PySpark Fundamentals

Language: Jupyter Notebook - Size: 13.7 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0

hyeonsangjeon/dataplatform

Hadoop3.2 single/cluster mode with web terminal gotty, spark, jupyter pyspark, hive, eco etc.

Language: Shell - Size: 549 KB - Last synced: 30 days ago - Pushed: over 4 years ago - Stars: 11 - Forks: 1

rantoncuadrado/udacity_capstone_project

Udacity Data Engineering Nanodegree. Capstone Project.

Language: Jupyter Notebook - Size: 17.7 MB - Last synced: 3 months ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0

panashematsaudza/Ecommerce-Simple-Linear-Regression-

PySpark Ecommerce Simple Linear Regression

Language: Jupyter Notebook - Size: 51.8 KB - Last synced: 3 months ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 0

saikaryekar/PySpark-Plane-Dataset-Exploration

Explored a dataset of planes while learning PySpark commands.

Language: Jupyter Notebook - Size: 24.4 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0

Shashi42/Azure-End-to-End-Sales-Data-Analytics-Pipeline

This project builds an End-to-End Azure Data Engineering Pipeline, performing ETL and Analytics Reporting on the AdventureWorks2022LT Database.

Language: Jupyter Notebook - Size: 501 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0

archivesunleashed/notebooks

Various examples of notebooks for working with web archives with the Archives Unleashed Toolkit, and derivatives generated by the Archives Unleashed Toolkit.

Language: Jupyter Notebook - Size: 49.1 MB - Last synced: 13 days ago - Pushed: over 1 year ago - Stars: 21 - Forks: 4

joeliang0520/CryptoTweets

Text Classification and Data Anaylsis on Cryptocurrency Related Tweets in PySpark Enviorment

Language: Jupyter Notebook - Size: 8.95 MB - Last synced: 4 months ago - Pushed: about 2 years ago - Stars: 0 - Forks: 0

Akash8K/Stocks-Data-Analysis-In-DataBricks

Stocks Data Analysis In DataBricks - Using SQL and Pyspark

Language: HTML - Size: 1.84 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 1 - Forks: 0

jashshah-dev/Automating-EMR-Cluster-using-AWS-Lambda

Automate Amazon EMR clusters using Lambda for streamlined and scalable data processing workflows. Unlock the full potential of your data pipeline with LambdaEMR Automator.

Language: Python - Size: 8.79 KB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 0 - Forks: 0

Non-NeutralZero/pyspark-jupyter-env

Language: Shell - Size: 5.86 KB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 0 - Forks: 0

gupta-aayushkr/F1-Racing

The project aims to process Formula 1 racing data, create an automated data pipeline, and make the data available for presentation and analysis purposes.

Language: Python - Size: 5.04 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 1 - Forks: 0

hjh17/dbloy

Continuous Delivery tool for PySpark Notebooks based jobs on Databricks

Language: Python - Size: 591 KB - Last synced: 8 days ago - Pushed: about 3 years ago - Stars: 1 - Forks: 1

aashokvardhan/Analyzing-Neuroimaging-Data-with-PySpark-and-Thunder

Language: Jupyter Notebook - Size: 3.84 MB - Last synced: 5 months ago - Pushed: over 6 years ago - Stars: 0 - Forks: 0

aashokvardhan/Predicting-Forest-Cover-with-Decision-Trees

Language: Jupyter Notebook - Size: 10.5 MB - Last synced: 5 months ago - Pushed: over 6 years ago - Stars: 0 - Forks: 0

ianjeffries/car-accident-analysis

Analyzing car accidents in the United Kingdom using PySpark and Python for big data processing.

Language: Jupyter Notebook - Size: 11 MB - Last synced: 6 months ago - Pushed: almost 5 years ago - Stars: 2 - Forks: 3

manishghop/CS651-UW-Project

CS651 Final Project

Language: Jupyter Notebook - Size: 1.33 MB - Last synced: 6 months ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

nguyenminhduc9988/eurecom_aml

Eurecom Advanced Machine Learning course work

Language: Jupyter Notebook - Size: 3.4 MB - Last synced: 2 months ago - Pushed: almost 7 years ago - Stars: 1 - Forks: 0

simao-af/Microsoft-Malware-Prediction

Predict the probability of a Windows device being infected by malware based on different properties of that device.

Language: Jupyter Notebook - Size: 17.8 MB - Last synced: 6 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0

Rifat392000/BigDataAnalytics

Language: Jupyter Notebook - Size: 18.6 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0

quadrantofsola/PySpark_Dataframes

Analysis of Clinical Trial Dataset using Dataframes on PySpark

Size: 2.93 KB - Last synced: 6 months ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0

RosarioB/spark-streaming-kafka

Exploring Spark Structured Streaming features by making use of Jupiter notebooks, Pyspark and interacting with a Kafka cluster.

Size: 130 KB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 0 - Forks: 0

bademiya21/Identifying-Commuter-Travel-Patterns-In-Bus-Services

A project I did with Land Transport Authority, a statutory board, whose main role is to manage the transportation infra of Singapore which includes public transport like bus and trains. The agency was interested to understand how the bus services were being utilized by commuters during peak hours and if interventions could be introduced to further enhance commuter experience on bus services e.g. shorter waiting time, faster trips with skipping of bus stops etc. This required understanding archetypes of travel patterns by commuters in bus services. This project is an extension of what was previously done here: https://blog.data.gov.sg/fingerprint-of-a-bus-route-73e5be53dcf0

Language: Jupyter Notebook - Size: 10.4 MB - Last synced: 7 months ago - Pushed: over 5 years ago - Stars: 1 - Forks: 1

heischichou/Sample-CDM-Tagger

A simple tool to compare new data to historical records. It will tag rows accordingly as duplicate or NULL. The team of interns I was in designed this tool using PySpark and Jupyter Notebook in Microsoft Fabric as a practice exercise within Lexmark Research and Development Corporation's Digital Transformation program.

Language: Python - Size: 4.88 KB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 0 - Forks: 0

Jayveersinh-Raj/trip_duration_big_data

Taxi trip duration forecasting using Big data and spark ML

Language: Jupyter Notebook - Size: 203 MB - Last synced: 8 months ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

luuisotorres/Kaggle-Titanic-Machine-Learning-Competition-with-PySpark

This notebook is my first attempt at using PySpark for EDA and Machine Learning models.

Language: Jupyter Notebook - Size: 25.4 KB - Last synced: 8 months ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0

RiccardoRobb/BigData_project

Tweet sentiment analysis

Language: Jupyter Notebook - Size: 92.6 MB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 0 - Forks: 0

benjbaron/GeoNames

GeoNames cities search service powered by Algolia

Language: Jupyter Notebook - Size: 16.6 MB - Last synced: 2 months ago - Pushed: about 4 years ago - Stars: 5 - Forks: 5

norbertolimonjr/KMeans-Clustering-Segmentation-Analysis

Online Retail Cassification for Marketing Segmentation Project using KMeans Clustering, Elbow Method and Silhouette Method for Validation

Language: Jupyter Notebook - Size: 53.4 MB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 0 - Forks: 0

ghanmi-hamza/Machine-learning-with-PySpark

This notebook contains the usage of Pyspark to build machine learning classifiers (note that almost ml_algorithm supported by Pyspark are used in this notebook)

Language: Jupyter Notebook - Size: 109 KB - Last synced: 9 months ago - Pushed: almost 4 years ago - Stars: 1 - Forks: 0

RosarioB/spark

Exercises on Apache Spark

Size: 88.9 KB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 0 - Forks: 0

jacobceles/intro-to-colab-pyspark-emr

A tutorial that helps Big Data Engineers ramp up faster by getting familiar with PySpark dataframes and functions. It also covers topics like EMR sizing, Google Colaboratory, fine-tuning PySpark jobs, and much more.

Language: Jupyter Notebook - Size: 438 KB - Last synced: 9 months ago - Pushed: over 2 years ago - Stars: 13 - Forks: 7

PeterSchuld/Sparkify

Capstone Project in the Udacity Data Scientist Nanodegree program. We manipulate large and realistic datasets with Spark to engineer relevant features for predicting churn. We'll learn how to use Spark MLlib to build machine learning models with large datasets, far beyond what could be done with non-distributed technologies like scikit-learn.

Language: HTML - Size: 2.44 MB - Last synced: 9 months ago - Pushed: almost 3 years ago - Stars: 1 - Forks: 0

NTBlok/customer-lifetime-value

A pyspark ETL example using a jupyter/pyspark-notebook Docker container

Language: Jupyter Notebook - Size: 459 KB - Last synced: 9 months ago - Pushed: about 7 years ago - Stars: 3 - Forks: 3

syedhassaanahmed/azure-data-manager

Cloud services for defining, ingesting, transforming, analyzing and showcasing big data

Language: C# - Size: 759 KB - Last synced: 9 months ago - Pushed: over 5 years ago - Stars: 1 - Forks: 2

zulfiqarAlibalti/PyTorch

This repo contains PyTorch Projects from Basic to Advance

Language: Jupyter Notebook - Size: 8.79 KB - Last synced: 9 months ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0

galib360/BigData_Project

Language: Jupyter Notebook - Size: 3.89 MB - Last synced: 9 months ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0

prakass1/SparkProject

Usage of Apache Spark and Graphx

Language: Jupyter Notebook - Size: 1.78 MB - Last synced: 10 months ago - Pushed: over 5 years ago - Stars: 0 - Forks: 0

Wendy-hub/MusicPrediction

Music prediction using PySpark

Language: HTML - Size: 2.83 MB - Last synced: 10 months ago - Pushed: over 1 year ago - Stars: 1 - Forks: 0

HaJunYoo/Pyspark-tutorial

PySpark을 Colab, docker 환경에서 실습한 spark 코드 정리 레포지토리입니다

Language: Jupyter Notebook - Size: 62.7 MB - Last synced: 9 months ago - Pushed: 10 months ago - Stars: 2 - Forks: 0

kristin-kim/gcp-dataproc_serverless-running-notebooks

Orchestrator to run Notebooks on Dataproc SERVERLESS via Cloud Composer

Language: Jupyter Notebook - Size: 204 KB - Last synced: 10 months ago - Pushed: about 1 year ago - Stars: 0 - Forks: 1

burakai/pyspark-tutorial

This tutorial based on "Pyspark with Python" YouTube playlist of Krish Naik (youtube.com/@krishnaik06). The series is also published on freeCodeCamp's YouTube channel (youtube.com/@freecodecamp). Thank them all!

Language: Jupyter Notebook - Size: 13.7 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 0

niteshjindal-7/cricket-world-cup2019--fall-of-wicket-prediction-pyspark-MLlib

Language: Jupyter Notebook - Size: 34.2 KB - Last synced: 11 months ago - Pushed: over 2 years ago - Stars: 1 - Forks: 0

vigneshSs-07/Complete-AtoZ-Pyspark

This repo explains pyspark modules in python. Used to deal with big data more practical handson.

Language: Jupyter Notebook - Size: 1.85 MB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 3 - Forks: 2

prabeesh/pyspark-notebook

Pyspark Notebook With Docker

Language: Python - Size: 258 KB - Last synced: 21 days ago - Pushed: almost 9 years ago - Stars: 11 - Forks: 11

srinathsai/Docker-Application

This project aims to demonstrate Importance of Docker in enabling faster software deliver cycles by implementing Ubuntu, Pyspark as inbuilt and allowing user to run inbuilt wordcount program using Pyspark

Language: Jupyter Notebook - Size: 203 KB - Last synced: 12 months ago - Pushed: 12 months ago - Stars: 0 - Forks: 0

Big-Data-FC/project

Predict how many points an European football team will end the season with, according to the characteristics of its players. Project for the Big Data Computing course at Sapienza University of Rome (2021-22)

Language: Jupyter Notebook - Size: 255 MB - Last synced: 12 months ago - Pushed: over 1 year ago - Stars: 6 - Forks: 0

Mathews-Tom/MSc-in-Machine-Learning-and-Artificial-Intelligence

Master of Science in Machine Learning & Artificial Intelligence - Indian Institute Technology Madras & Liverpool John Moores University

Language: Jupyter Notebook - Size: 2.12 GB - Last synced: almost 1 year ago - Pushed: almost 1 year ago - Stars: 6 - Forks: 7

atullal/Exploring-the-Home-Mortgage-Market

Our goal with this dataset is to explore the Home Mortgage market within the US to identify patterns the data on the basis of gender, race, income, property type, loan type, amount and location.

Language: Jupyter Notebook - Size: 789 KB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

arinjayg/Rev_P1

Bank Transaction EDA using PySpark

Language: Jupyter Notebook - Size: 237 KB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 1 - Forks: 0

notPlancha/pbd

trabalho de pbd

Language: Jupyter Notebook - Size: 8.22 MB - Last synced: 12 months ago - Pushed: about 1 year ago - Stars: 1 - Forks: 0

AbhimanyuW/BigData-EthereumAnalysis

A coursework on Ethereum Analysis using PySpark, as a part of curriculum at Queen Mary University of London.

Language: Jupyter Notebook - Size: 493 KB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

prasanjit15/Apache-Spark-Projects

This repo contains all the projects I did using Apache Spark.

Language: Jupyter Notebook - Size: 5.27 MB - Last synced: 11 months ago - Pushed: over 3 years ago - Stars: 2 - Forks: 1

RickLeite/learning-batch-processing

Learning batch processing with Pyspark Interface for Apache Spark

Language: Jupyter Notebook - Size: 33.2 KB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

mohanakrishnavh/PySpark-Tutorial

Language: Jupyter Notebook - Size: 2.87 MB - Last synced: about 1 year ago - Pushed: about 6 years ago - Stars: 17 - Forks: 19

behnamy2010/PySpark-Clustering

PySpark - Clustering with Kmeans++ and Bisecting K-means

Language: Jupyter Notebook - Size: 613 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

behnamy2010/PySpark-Word-Count

PySpark Word Count

Language: Jupyter Notebook - Size: 1.85 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

behnamy2010/Pyspark-Malware-Detection-Using-Assembly-Code-and-Byte-Codes

Pyspark-Malware Detection Using Assembly Code and Byte Codes in Big 2015 Dataset

Language: Jupyter Notebook - Size: 121 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

miquido/DataScience

Useful scripts and notebooks for Data Science. The project was made by Miquido. https://www.miquido.com/

Language: Jupyter Notebook - Size: 129 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 10 - Forks: 2

johntelforduk/betfair-data-analysis

Explore, analyse and visualise Betfair Historical Data Feed using PySpark.

Language: Jupyter Notebook - Size: 398 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 11 - Forks: 3

pmbrull/pmbrull-github-io-archive 📦

Language: HTML - Size: 48.5 MB - Last synced: about 1 year ago - Pushed: over 4 years ago - Stars: 0 - Forks: 0

EilinLux/SparkCertification

notes for pyspark certification with notebooks

Language: Jupyter Notebook - Size: 301 MB - Last synced: 11 months ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

brennerh1/databricks-demos

Repository of notebooks and related collateral used in the Databricks Demo Hub, showing how to use Databricks, Delta Lake, MLflow, and more.

Language: Python - Size: 1.06 MB - Last synced: about 1 year ago - Pushed: almost 3 years ago - Stars: 18 - Forks: 45

airdipu/Covid19-Big-Data 📦

This is a project of COVID-19 infections in Australia and the possible infection rates prediction using Spark.

Language: Jupyter Notebook - Size: 188 KB - Last synced: 9 months ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 0

dlleonardo/spark-assignments

Spark assignments from "Introduction to Big Data" course (offered by IBM Skills Network)

Language: Jupyter Notebook - Size: 28.3 KB - Last synced: 11 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

dlleonardo/spark-de-ml-assignments

Spark DE&ML assignments from the "Data Engineering and Machine Learning with Spark" course (offered by IBM Skills Network)

Language: Jupyter Notebook - Size: 56.6 KB - Last synced: 11 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

arjones/bigdata-workshop-es

Workshop Big Data en Español

Language: HTML - Size: 49.2 MB - Last synced: 5 months ago - Pushed: 6 months ago - Stars: 19 - Forks: 59

imsanjoykb/PySpark-Bootcamp

My Practice and project on PySpark

Language: Jupyter Notebook - Size: 4.52 MB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 6 - Forks: 1

samuelesimone/Pyspark-fundamentals

Pyspark fundamentals

Language: Jupyter Notebook - Size: 1.95 KB - Last synced: 11 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

Sanjay-dev-ds/DWBI_Sales_Prediction

Created DW for sales data source and visualization done for the relevant requirements. Sales Prediction (Time Series) is done using the DW.

Language: Jupyter Notebook - Size: 1.24 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

CelineWW/Amazon_Vine_Bias_Pyspark

Using PySpark, Amazon RDS database and S3 bucket performed analysis on Amazon office product reviews. The percentage of 5 star reviews was calculated to check if there is any positivity bias with Vine reviews comparing Non-Vine reviews.

Language: Jupyter Notebook - Size: 30.5 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

jitsejan/pyspark-101

A PySpark course to get started with the basics for a Data Engineer

Language: Jupyter Notebook - Size: 18.6 KB - Last synced: about 1 year ago - Pushed: about 6 years ago - Stars: 8 - Forks: 7

abidor13/Amazon_Vine_Analysis

Given access to approximately 50 datasets, each containing reviews of a specific product and written by members of the paid Amazon Vine Program. We used PySpark to perform the ETL process to extract the dataset, transform the data, connect to an AWS RDS instance, and load the transformed data into PgAdmin.

Language: Jupyter Notebook - Size: 27.2 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 1 - Forks: 0