An open API service providing repository metadata for many open source software ecosystems.

Topic: "pyspark-sql"

vectra-ai-research/pyspark-style-guide

Our style guide for writing readable and maintainable PySpark code.

Size: 63.5 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 17 - Forks: 3

ttariqaziz/data_science_cheat_sheets

All updated cheat sheets regarding data science, data analysis provided by Datacamp are here. These cheat sheets cover quick reads on Machine Learning, Deep Learning, Python, R, SQL and more. Perfect cheat sheets when you want to revise some topics in less time.

Size: 39.2 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 1

CamilaJaviera91/pyspark-first-approach

This code demonstrates how to integrate PySpark with datasets and perform simple data transformations. It loads a sample dataset using PySpark's built-in functionalities or reads data from external sources and converts it into a PySpark DataFrame for distributed processing and manipulation.

Language: Python - Size: 2.72 MB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

AlfaBetaBeta/Spark-Movie-Ratings

This notebook performs EDA over a movie ratings dataset via pyspark sql.

Language: Jupyter Notebook - Size: 1.88 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

neha-dev-dot/Pyspark-Tutorial

This repository is part of my journey to learn **PySpark**, the Python API for Apache Spark. I explored the fundamentals of distributed data processing using Spark and practiced with real-world data transformation and querying use cases.

Language: Jupyter Notebook - Size: 230 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

CamilaJaviera91/sql-mock-data

Generate a synthetic dataset with one million records of employee information from a fictional company, load it into a PostgreSQL database, create analytical reports using PySpark and large-scale data analysis techniques, and implement machine learning models to predict trends in hiring and layoffs on a monthly and yearly basis.

Language: Python - Size: 217 MB - Last synced at: 5 days ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

VincentLimarus/machineLearning-models

Clustering vs Classification

Language: Jupyter Notebook - Size: 410 KB - Last synced at: 4 months ago - Pushed at: 12 months ago - Stars: 1 - Forks: 0

vara-co/Home_Sales

Module 22 challenge: Using Google Colab to work on Big Data queries with PySpark SQL, parquet, and cache partitions

Language: Jupyter Notebook - Size: 46.9 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

nmcintyre5/admissionPredictionML

This script builds a linear regression model using PySpark to predict student admissions at Unicorn University.

Language: Python - Size: 401 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

thunchanokbow/Inventory-Amazon

Inventory value is also important for determining a company's liquidity, or its ability to meet its short-term financial obligations. A high inventory value can indicate that a company has too much money tied up in inventory, which could make it difficult for the company to pay its bills.

Language: Jupyter Notebook - Size: 21.3 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

melekny/Banking-Data-Analysis

Data analysis project with Pyspark on Jupyter Notebook

Language: Jupyter Notebook - Size: 263 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

LalitSharma7/F1-Data-Analysis

Project based on application of azure databricks

Language: Python - Size: 28.3 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

cc59chong/Big-Data-Fundamentals-with-PySpark

Language: Jupyter Notebook - Size: 7.23 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 1

essien1990/Apache-Spark

Batch Processing using Apache Spark and Python for data exploration

Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: 15 days ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 1

ghanmi-hamza/Machine-learning-with-PySpark

This notebook contains the usage of Pyspark to build machine learning classifiers (note that almost ml_algorithm supported by Pyspark are used in this notebook)

Language: Jupyter Notebook - Size: 109 KB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

bigenius-x/datavault-mart-databricks

Example Project for DataVault and Mart Databricks

Size: 147 KB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 0 - Forks: 1

bigenius-x/dimensional-mart-databricks

Example Project for Dimensional and Mart Databricks

Size: 46.9 KB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 0 - Forks: 0

bigenius-x/stage-file-databricks

Example Project for Stage File Databricks

Size: 190 KB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 0 - Forks: 0

mihirchhiber/Network-Intrusion-Detector

Network Intrusion Detector is a distributed intrusion detection system built with PySpark. It preprocesses, encodes, and models network traffic data to detect anomalies using a Random Forest classifier, achieving high accuracy and efficiency through feature selection and scalable data processing. The system is suitable for large-scale environments

Language: Jupyter Notebook - Size: 860 KB - Last synced at: 26 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Wb-az/MLib-PySpark-SoundLevel-Prediction

Creates a ML Pipeline leveraging PySpark SQL and PySpark MLib to predict sound level

Language: Jupyter Notebook - Size: 972 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

nazif96/Disease-prediction

Cardiovascular Disease Prediction

Language: Jupyter Notebook - Size: 62.5 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

lmizner/Codecademy_Big_Data_with_PySpark

Language: Jupyter Notebook - Size: 3.66 MB - Last synced at: 8 days ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

Bayunova28/Airbnb_Market_Analytics

This repository contains about data analytics project using PySpark SQL for Airbnb at NYC

Language: Jupyter Notebook - Size: 3.26 MB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

asenacak/recommenderSystems-SteamVideoGames

Language: Jupyter Notebook - Size: 1.65 MB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

Lefteris-Souflas/Spark-Movies-Analytics

Utilizing Apache Spark & PySpark to analyze a movie dataset. Tasks include data exploration, identifying top-rated movies, training a linear regression model, and experimenting with Airflow.

Language: Jupyter Notebook - Size: 289 KB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

CirsteanPaul/pyspark-project

Big data management with PySpark

Language: Jupyter Notebook - Size: 251 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Kebab-kun/PySpark-House-Price-Prediction

PySpark House Price Prediction features a PySpark-based Linear Regression model for predicting median house prices. It showcases data preprocessing, model training, and evaluation, yielding an RMSE of around 0.11. The code offers insights into building robust predictive models using PySpark.

Language: Jupyter Notebook - Size: 211 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

bhavanachitragar/Data-Analysis-using-Pyspark

Working with pyspark module in python and using google colab environment in order to apply some queries to the dataset. The dataset consist of two csv files listening.csv and genre.csv. Also, visualizing query results using matplotlib.

Language: Jupyter Notebook - Size: 17.6 KB - Last synced at: about 14 hours ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

data42lana/learning_big_data_tools 📦

The notebook shows how tools of the PySpark SQL module work in practice.

Language: Jupyter Notebook - Size: 56.6 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

avimonda298/Pyspark

pyspark streaming

Language: Python - Size: 19.5 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

GabrieleCarl/twitter-real-time-sentiment-analysis

twitter real-time sentiment analysis

Language: Jupyter Notebook - Size: 23.4 KB - Last synced at: 10 months ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

tharikf/PySpark_KingCounty

Language: Jupyter Notebook - Size: 13.7 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Tinmarian/Airflow2.0-De-0-a-Heroe

Repositorio para realizar el curso en Udemy llamado "Airflow2.0 De 0 a Héroe", de la academia "Datapath".

Language: Python - Size: 43.9 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 1

supergloo/pyspark

PySpark examples

Size: 40 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Nandan9911/Big-Data-minor-projects

Problems on Hadoop-MapReduce, Hive and PySparkSQL

Language: Java - Size: 16.6 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

sawlachintan/cs440-pj4

Size: 8.79 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

steve303/sparkSQL

Objective: Perform word count tasks and joins using spark SQL within a Docker container

Language: Python - Size: 87.9 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 1

amalaj7/Pyspark-Notes

This repository contains the Notes for Pyspark

Language: Jupyter Notebook - Size: 1.87 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 2

Ashutosh27ind/pySparkAirlinesDataAnalysis

PySpark Data Analysis for airlines dataset for files hosted on HDFX=S.

Language: Jupyter Notebook - Size: 2.82 MB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

Ashutosh27ind/pySparkMLAnalysis

PySpark ML Heart and Advertisement Data Analysis

Language: Jupyter Notebook - Size: 8.79 KB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0