Topic: "pyspark-notebook"
josephmachado/efficient_data_processing_spark
Code for "Efficient Data Processing in Spark" Course
Language: Python - Size: 23.9 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 292 - Forks: 62

hyunjoonbok/PySpark
PySpark functions and utilities with examples. Assists ETL process of data modeling
Language: Jupyter Notebook - Size: 3.79 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 89 - Forks: 73

jplane/pyspark-devcontainer
A simple VS Code devcontainer setup for local PySpark development
Language: Jupyter Notebook - Size: 318 KB - Last synced at: 28 days ago - Pushed at: almost 2 years ago - Stars: 50 - Forks: 28

josephmachado/docker_for_data_engineers
Code for blog at: https://www.startdataengineering.com/post/docker-for-de/
Language: C - Size: 561 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 36 - Forks: 15

archivesunleashed/notebooks
Various examples of notebooks for working with web archives with the Archives Unleashed Toolkit, and derivatives generated by the Archives Unleashed Toolkit.
Language: Jupyter Notebook - Size: 49.1 MB - Last synced at: 20 days ago - Pushed at: over 2 years ago - Stars: 26 - Forks: 4

microsoft/Fabric-RTA-FlightStream
Microsoft Fabric Real-time Analytics flight streaming
Language: Jupyter Notebook - Size: 1.04 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 21 - Forks: 4

arjones/bigdata-workshop-es
Workshop Big Data en Español
Language: HTML - Size: 49.2 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 19 - Forks: 59

aakinlalu/Crime-Classification-using-PySpark
classify crime into different categories using PySpark
Language: Jupyter Notebook - Size: 311 KB - Last synced at: about 1 year ago - Pushed at: almost 6 years ago - Stars: 19 - Forks: 14

brennerh1/databricks-demos
Repository of notebooks and related collateral used in the Databricks Demo Hub, showing how to use Databricks, Delta Lake, MLflow, and more.
Language: Python - Size: 1.06 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 18 - Forks: 45

mohanakrishnavh/PySpark-Tutorial
Language: Jupyter Notebook - Size: 2.87 MB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 17 - Forks: 19

jacobceles/intro-to-colab-pyspark-emr
A tutorial that helps Big Data Engineers ramp up faster by getting familiar with PySpark dataframes and functions. It also covers topics like EMR sizing, Google Colaboratory, fine-tuning PySpark jobs, and much more.
Language: Jupyter Notebook - Size: 438 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 13 - Forks: 7

yennanliu/analysis
Repo for practical data science problems approaches, including notebook demo and working scripts | #DS | #analysis
Language: Jupyter Notebook - Size: 170 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 12 - Forks: 10

johntelforduk/betfair-data-analysis
Explore, analyse and visualise Betfair Historical Data Feed using PySpark.
Language: Jupyter Notebook - Size: 398 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 11 - Forks: 3

hyeonsangjeon/dataplatform
Hadoop3.2 single/cluster mode with web terminal gotty, spark, jupyter pyspark, hive, eco etc.
Language: Shell - Size: 549 KB - Last synced at: 25 days ago - Pushed at: over 5 years ago - Stars: 11 - Forks: 1

prabeesh/pyspark-notebook
Pyspark Notebook With Docker
Language: Python - Size: 258 KB - Last synced at: about 1 year ago - Pushed at: over 9 years ago - Stars: 11 - Forks: 11

miquido/DataScience
Useful scripts and notebooks for Data Science. The project was made by Miquido. https://www.miquido.com/
Language: Jupyter Notebook - Size: 130 KB - Last synced at: 9 days ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 3

AnandaRauf/CekatanBiz
CekatanBiz is Software Tools Data Analyst,Business Analyst,and Business Intelligence. Developed using Python.
Language: Jupyter Notebook - Size: 1.28 MB - Last synced at: 25 days ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 1

imsanjoykb/PySpark-Bootcamp
My Practice and project on PySpark
Language: Jupyter Notebook - Size: 4.52 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 8 - Forks: 3

jitsejan/pyspark-101
A PySpark course to get started with the basics for a Data Engineer
Language: Jupyter Notebook - Size: 18.6 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 8 - Forks: 7

lmriccardo/fraudolent-transaction-classification
Project for the Big Data Computing course at the University of "La Sapienza" in Master in Computer Science A.A. 2021/2022
Language: Jupyter Notebook - Size: 15.8 MB - Last synced at: 4 days ago - Pushed at: almost 3 years ago - Stars: 7 - Forks: 1

Mathews-Tom/MSc-in-Machine-Learning-and-Artificial-Intelligence
Master of Science in Machine Learning & Artificial Intelligence - Indian Institute Technology Madras & Liverpool John Moores University
Language: Jupyter Notebook - Size: 2.12 GB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 6 - Forks: 7

Big-Data-FC/project
Predict how many points an European football team will end the season with, according to the characteristics of its players. Project for the Big Data Computing course at Sapienza University of Rome (2021-22)
Language: Jupyter Notebook - Size: 255 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 0

alisonpezzott/calendario_fabric_lakehouse
Tabela calendário para lakehouse Fabric a partir do notebook spark
Language: Python - Size: 69.3 KB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 5 - Forks: 3

vigneshSs-07/Pyspark-ACompleteGuide
This repo explains pyspark modules in python. Used to deal with big data more practical handson.
Language: Jupyter Notebook - Size: 1.86 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 5 - Forks: 3

shsarv/Cardio-Monitor
Cardio Monitor is a web app that helps you to find out whether you are at risk of developing heart disease. the model used for prediction has an accuracy of 92%. This is the course project of subject Big Data Analytics (BCSE0158).
Language: Jupyter Notebook - Size: 7.66 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 5 - Forks: 0

easonlai/Samples_for_Azure_Databricks_Orientation
Samples for Azure Databricks Orientation
Language: HTML - Size: 6.78 MB - Last synced at: 21 days ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 2

benjbaron/GeoNames
GeoNames cities search service powered by Algolia
Language: Jupyter Notebook - Size: 16.6 MB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 5 - Forks: 5

jpacerqueira-zz/Akamai-log-Analysis-SparkML-H2o
Transformation of Akamai Logs with Spark ETL and discover of Values and similarities in logs used SparkML and H2O ML
Language: HTML - Size: 106 MB - Last synced at: 5 months ago - Pushed at: about 6 years ago - Stars: 5 - Forks: 1

digitalhemanth/Data-Science
Data Science with Machine learning Algorithms using Python PySpark pandas Numpay TensorFlow Keras seaborn matplotlib
Language: Jupyter Notebook - Size: 109 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 0

gunarevuri/US-Immigrants-Analysis
Language: Jupyter Notebook - Size: 174 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 1

ARomoH/Keras-Distributed-Streaming
Distributed Keras model for making predictions of sentiment from Spanish sentences in stream context using Spark Streaming and Apache Kafka
Language: Jupyter Notebook - Size: 754 KB - Last synced at: about 2 months ago - Pushed at: over 7 years ago - Stars: 3 - Forks: 1

NTBlok/customer-lifetime-value
A pyspark ETL example using a jupyter/pyspark-notebook Docker container
Language: Jupyter Notebook - Size: 459 KB - Last synced at: almost 2 years ago - Pushed at: about 8 years ago - Stars: 3 - Forks: 3

conorheffron/ironoc-spark
Sample pyspark Notebook
Language: Jupyter Notebook - Size: 17.6 KB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

kyaiooiayk/pySpark-Notes
Notes, tutorials, code snippets and templates focused on PySpark for Machine Learning
Language: Jupyter Notebook - Size: 342 KB - Last synced at: 5 months ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 1

HaJunYoo/Pyspark-tutorial
PySpark을 Colab, docker 환경에서 실습한 spark 코드 정리 레포지토리입니다
Language: Jupyter Notebook - Size: 62.7 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

easonlai/databricks_delta_table_samples
This is a code sample repository for demonstrating how to perform Databricks Delta Table operations.
Language: HTML - Size: 23.9 MB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 2

nadia1123/movielens-dataset-with-pyspark
Exploring the MovieLens Dataset with pySpark
Language: Jupyter Notebook - Size: 10.7 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 2 - Forks: 7

j-i-l/ReviewedGrapes
ML models predicting wine varieties based on a wine review texts
Language: Jupyter Notebook - Size: 2.92 MB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 2 - Forks: 0

Ihebdhouibi/Spark-with-machine-learning-
Exploring spark machine learning capabilities
Language: Jupyter Notebook - Size: 5.57 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

prasanjit15/Apache-Spark-Projects
This repo contains all the projects I did using Apache Spark.
Language: Jupyter Notebook - Size: 5.27 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 1

TrentBrunson/Big_Data
Apache Hadoop: HDFS, MapReduce, YARN, NLP, AWS, Spark, Google Colab, PySpark
Language: Jupyter Notebook - Size: 109 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 1

colbyford/PyDataCLT_Jan2020
Scale your Python Code with PySpark in Apache Spark - PyData Charlotte January 2020 Meeting
Language: HTML - Size: 36 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

RRighart/Retail
A repo containing code for retail sales analyses
Language: Jupyter Notebook - Size: 27.3 KB - Last synced at: 6 months ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 1

ianjeffries/car-accident-analysis
Analyzing car accidents in the United Kingdom using PySpark and Python for big data processing.
Language: Jupyter Notebook - Size: 11 MB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 2 - Forks: 3

mayankskb/PySpark
Repository for the pyspark work
Language: HTML - Size: 7.07 MB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 2 - Forks: 3

gvatsal60/PySparkTutorial
Comprehensive guide to mastering `PySpark` through hands-on tutorials and examples.
Language: Shell - Size: 20.5 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 0

gvatsal60/PySparkTemplate
A lightweight template for building PySpark applications efficiently inside devcontainer
Language: Shell - Size: 26.4 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 0

bitoollearner/leetcode-pyspark
This repository is dedicated to solutions for LeetCode SQL questions implemented in PySpark.
Language: Jupyter Notebook - Size: 553 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 1

SnehaDharne/BigDataAnalytics-MVCollisions
Leveraging NYC Open Data, this repository contains Databricks notebooks for analyzing motor vehicle collisions. We perform EDA, spatial clustering, and predictive modeling on collision, vehicle, and person datasets to understand accident trends and predict potential risks.
Language: Jupyter Notebook - Size: 7.64 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

Bayunova28/BikeStore_DWH_Analytics
This repository contains about data analytics & data warehouse project from bike store
Language: Jupyter Notebook - Size: 1.31 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

TravelXML/APACHE-SPARK-PYSPARK-DATABRICKS
APACHE SPARK: Data Analysis, Transformation, and Visualisation with PySpark, IPL Data Analysis
Language: Jupyter Notebook - Size: 2.25 MB - Last synced at: about 3 hours ago - Pushed at: 9 months ago - Stars: 1 - Forks: 1

Akash8K/Stocks-Data-Analysis-In-DataBricks
Stocks Data Analysis In DataBricks - Using SQL and Pyspark
Language: HTML - Size: 1.84 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

gupta-aayushkr/F1-Racing
The project aims to process Formula 1 racing data, create an automated data pipeline, and make the data available for presentation and analysis purposes.
Language: Python - Size: 5.04 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

matthieuvion/spark-cluster
Steps to deploy a local spark cluster w/ Docker. Bonus: a ready-to-use notebook for model prediction on Pyspark using spark.ml Pipeline() on a well known dataset
Language: Jupyter Notebook - Size: 628 KB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

arinjayg/Rev_P1
Bank Transaction EDA using PySpark
Language: Jupyter Notebook - Size: 237 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

notPlancha/pbd
trabalho de pbd
Language: Jupyter Notebook - Size: 8.22 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

Wendy-hub/MusicPrediction
Music prediction using PySpark
Language: HTML - Size: 2.83 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

abidor13/Amazon_Vine_Analysis
Given access to approximately 50 datasets, each containing reviews of a specific product and written by members of the paid Amazon Vine Program. We used PySpark to perform the ETL process to extract the dataset, transform the data, connect to an AWS RDS instance, and load the transformed data into PgAdmin.
Language: Jupyter Notebook - Size: 27.2 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

shreyashji/Spark-PySpark-DataBricks
Adding my python,spark, pyspark, scala notebooks logics which i solve/see on daily basis,it contains optimization techniques for big data processing and real time scenarios
Language: Jupyter Notebook - Size: 813 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 1

FranzDiebold/advent-of-code-2021 📦
Solutions for Advent of Code 2021 in (Py)Spark
Language: Jupyter Notebook - Size: 22.5 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

polarbeargo/Data-Engineering-Capstone-Project
Language: Jupyter Notebook - Size: 834 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

niteshjindal-7/cricket-world-cup2019--fall-of-wicket-prediction-pyspark-MLlib
Language: Jupyter Notebook - Size: 34.2 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

krishnakaushik25/Movielens_spark_azure
Movielens dataset analysis for movie recommendations using Spark in Azure
Language: Jupyter Notebook - Size: 2.35 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

PeterSchuld/Sparkify
Capstone Project in the Udacity Data Scientist Nanodegree program. We manipulate large and realistic datasets with Spark to engineer relevant features for predicting churn. We'll learn how to use Spark MLlib to build machine learning models with large datasets, far beyond what could be done with non-distributed technologies like scikit-learn.
Language: HTML - Size: 2.44 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

hjh17/dbloy
Continuous Delivery tool for PySpark Notebooks based jobs on Databricks
Language: Python - Size: 591 KB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 1

koirand/spark-notebook-on-k8s-example
Sample to run PySpark on Kubernetes cluster.
Language: Jupyter Notebook - Size: 13.7 KB - Last synced at: 24 days ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

Crone1/Spark-Recommender-System
This project involves using Pyspark to create a recommendation system on the Google Cloud Platform
Language: Jupyter Notebook - Size: 588 KB - Last synced at: 8 months ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

MohamedKari/spark-on-k8s
A template repo showing how to natively run containerized PySpark workloads, containerized PySpark-backed notebooks, and the Spark history server on Kubernetes in general and AWS EKS specifically.
Language: Makefile - Size: 10.7 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

ghanmi-hamza/Machine-learning-with-PySpark
This notebook contains the usage of Pyspark to build machine learning classifiers (note that almost ml_algorithm supported by Pyspark are used in this notebook)
Language: Jupyter Notebook - Size: 109 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

itsayushthada/ML-on-IBM-Watson
Notebooks for Advanced Data Science with IBM Specialization
Language: Jupyter Notebook - Size: 99.6 KB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 2

imratnesh/pyspark
Pyspark, machine learning, python
Language: HTML - Size: 123 KB - Last synced at: about 6 hours ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 1

syedhassaanahmed/azure-data-manager
Cloud services for defining, ingesting, transforming, analyzing and showcasing big data
Language: C# - Size: 759 KB - Last synced at: 3 months ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 2

bademiya21/Identifying-Commuter-Travel-Patterns-In-Bus-Services
A project I did with Land Transport Authority, a statutory board, whose main role is to manage the transportation infra of Singapore which includes public transport like bus and trains. The agency was interested to understand how the bus services were being utilized by commuters during peak hours and if interventions could be introduced to further enhance commuter experience on bus services e.g. shorter waiting time, faster trips with skipping of bus stops etc. This required understanding archetypes of travel patterns by commuters in bus services. This project is an extension of what was previously done here: https://blog.data.gov.sg/fingerprint-of-a-bus-route-73e5be53dcf0
Language: Jupyter Notebook - Size: 10.4 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 1

manoharpalanisamy/Distributed-Keras
Research And Development on Distributed Keras with Spark
Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 0

twseptian/Coastal-and-Offshore-Marine-Zones
Spatial Database Final Project - Coastal and Offshore Marine Zones with Geopandas and Pyspark
Language: Jupyter Notebook - Size: 45.2 MB - Last synced at: 2 days ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 1

rishanki/correlation-matrix_Pyspark_RDD
Language: Jupyter Notebook - Size: 273 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 2

nguyenminhduc9988/eurecom_aml
Eurecom Advanced Machine Learning course work
Language: Jupyter Notebook - Size: 3.4 MB - Last synced at: about 1 year ago - Pushed at: almost 8 years ago - Stars: 1 - Forks: 0

akanshu22/Text-Categorization-using-NGrams-in-Apache-Spark
Apache Spark based implementation of research paper titled "N-gram-based text categorization"
Language: Jupyter Notebook - Size: 2.09 MB - Last synced at: about 2 years ago - Pushed at: about 8 years ago - Stars: 1 - Forks: 0

EchoSingh/pySpark_movie_analysis
This project analyzes the MovieLens 20M dataset using PySpark, with interactive visualizations provided by Streamlit. Additionally, a Kaggle notebook offers more insights into the analysis.
Language: Python - Size: 638 KB - Last synced at: 10 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

Saikesana31/Netflix
Azure Data engineering project
Language: Python - Size: 1.3 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

RaviSoni804426/Pyspark-With-Python Fork of krishnaik06/Pyspark-With-Python
This repository contains tutorials and examples for working with PySpark, covering data processing, transformations, machine learning, and more.
Language: Jupyter Notebook - Size: 40 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

VandanaBhumireddygari/Open-Table-Formats-with-Databricks-and-Delta-Lake
This project demonstrates the use of Open Table Formats with Databricks, PySpark, and Delta Lake. It covers data ingestion, transformation, querying, and storage management using Delta tables. The project includes code for loading data, writing it to Delta format, querying, and utilizing Delta Lake
Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

s1ddh-rth/fake-kafka
A simulated Kafka data pipeline that generates fake customer and order data, processes it through Kafka, and stores it in PostgreSQL for real-time analysis with PySpark. Includes Kafdrop UI for monitoring. 🚀
Language: Python - Size: 4.88 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

TahirZia-1/EDA-Netflix-Dataset-using-PySpark-on-Docker
This project demonstrates how to perform Exploratory Data Analysis (EDA) on the Netflix dataset using PySpark in a Jupyter Notebook environment. It involves setting up Spark, loading a dataset, performing basic data cleaning, and visualizing the results. All of it is runnning on a container in Docker.
Language: Jupyter Notebook - Size: 1.75 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

FoggySmile/BigData_ITMO
Big Data: Spark Lab and ClickHouse Lab Solutions
Language: Jupyter Notebook - Size: 144 KB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

Rifat392000/BigDataAnalytics
Language: Jupyter Notebook - Size: 18.4 MB - Last synced at: 7 days ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

SAI-MOHAN-B/Spark-Structured-Streaming
This repo is for the Structured Streaming and Projects
Language: Python - Size: 13.7 KB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

citysiva180/databricks_practice_repo
This repo is built to learn and practice databricks and PySpark. This is the practice repo for databricks Data Engineering Associate Certification
Language: Python - Size: 523 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Non-NeutralZero/spark247-jupyter-dockerized
spark247-jupyter-dockerized
Language: Python - Size: 17.6 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

arturogonzalezm/energy_price_and_demand_forecast
AEMO Aggregated price and demand data
Language: Python - Size: 14.1 MB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

ganeshkavhar/DataFrame-Data-Generator-by-ganesh-kavhar
Small code practices to generate a good dummy dataframes for PySpark Practices
Language: Jupyter Notebook - Size: 5.86 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

HenryBao91/PySpark-Learning-Tutorial
Hadoop+PySpark大数据挖掘、处理与分析
Language: Jupyter Notebook - Size: 11.1 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

mananabbasi/Data-Science-Complete-Project-using-Big-Data-Tools-Techniques-
This repository contains Databricks projects utilizing RDDs, DataFrames, and SQL to process and analyze various real-world datasets. Data cleaning and analysis have been performed using PySpark functions to handle challenges such as inconsistent formats, missing values, and complex data structures. The project ensures efficient data transformation
Language: HTML - Size: 3.71 MB - Last synced at: about 2 hours ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

sush4nt/docker-containers
References for building custom IDEs
Language: Shell - Size: 36.1 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

kaladabrio2020/pyspark-ml-analysis-data
Analises de Dados e machine learning com o Pyspark
Language: Jupyter Notebook - Size: 1.81 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Betico1928/Talleres-ProcesamientoDeDatosAGranEscala
Exploración los principios del Procesamiento de Datos a Gran Escala con talleres de Databricks y Spark. Aprender herramientas como Pandas y PySpark para el análisis eficiente de grandes conjuntos de datos. Impartidos por John Corredor en la Pontificia Universidad Javeriana.
Language: Jupyter Notebook - Size: 203 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

rsantos2032/Cardiovascular-Disease-Detection
Cardiovascular Disease Detection using PySpark
Language: Jupyter Notebook - Size: 1.09 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

ganeshkavhar/PySpark-GroupBy
Learn GroupBy in PySpark
Language: Jupyter Notebook - Size: 4.88 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

RaghulKrish1798/PySpark_Intro
Learning PySpark Fundamentals
Language: Jupyter Notebook - Size: 13.7 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

saikaryekar/PySpark-Plane-Dataset-Exploration
Explored a dataset of planes while learning PySpark commands.
Language: Jupyter Notebook - Size: 24.4 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0
