GitHub topics: spark-sql
streamnative/pulsar-spark
Spark Connector to read and write with Pulsar
Language: Scala - Size: 703 KB - Last synced at: about 18 hours ago - Pushed at: about 19 hours ago - Stars: 113 - Forks: 51

shreyasaxena29103/tmdb-movie-analysis-and-optimization
This project explores the TMDB movie dataset (900K+ records) using PySpark and Spark SQL in Databricks. It focuses on performance optimization and analysis
Size: 2.93 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

japila-books/spark-sql-internals
The Internals of Spark SQL
Size: 1.46 GB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 468 - Forks: 132

apache/kyuubi
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Language: Scala - Size: 60.1 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2,207 - Forks: 947

s-yazhini/Hexa-DE-Main-Project
Data engineering main project 1
Language: Jupyter Notebook - Size: 15.5 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

apache/incubator-gluten
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
Language: Scala - Size: 199 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,373 - Forks: 552

databricks/LearningSparkV2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Language: Scala - Size: 75.2 MB - Last synced at: 4 days ago - Pushed at: 5 months ago - Stars: 1,306 - Forks: 775

goamegah/flowtrack
End-To-End Real-time Road Traffic Monitoring Spark Structured Streaming solution
Language: Scala - Size: 51 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

Sabal999/end-to-end-data-pipeline-acs
This repository showcases a robust end-to-end data pipeline for the American Community Survey dataset, utilizing tools like Python, SparkSQL, and Docker. 🚀 Explore the architecture that transforms raw data into valuable insights through a Bronze / Silver / Gold framework. 🐙
Language: Python - Size: 1.17 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

groda/big_data
Big Data essentials: Hadoop, MapReduce, Spark. Explore tutorials and demos in Jupyter notebooks—most are self-contained and live, ready to run with a click.
Language: Jupyter Notebook - Size: 51.9 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 78 - Forks: 27

anaregdesign/openaivec
Pandas extension, Tabular calculation with LLM, Spark UDF Builder
Language: Python - Size: 977 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 13 - Forks: 1

AlexRogalskiy/spark-patterns
🏆 Spark4You Design patterns
Language: Shell - Size: 20.4 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 3 - Forks: 0

xReniar/US-used-cars-analysis
US used cars analysis with Map-Reduce, Hive, Spark core and Spark SQL
Language: Python - Size: 2.25 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

dejarol/azure-search-spark
Azure AI Search connector for Spark
Language: Scala - Size: 1.47 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

almond-sh/almond
A Scala kernel for Jupyter
Language: Scala - Size: 12.8 MB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 1,615 - Forks: 251

minio/spark-select
A library for Spark DataFrame using MinIO Select API
Language: Scala - Size: 65.4 KB - Last synced at: 5 days ago - Pushed at: over 5 years ago - Stars: 98 - Forks: 19

Sushmi08B/NYC-YELLOWTAXI-DATA-ETL
Production-scale ETL pipeline using PySpark and PostgreSQL to process 100M+ NYC Yellow Taxi trip records with full data validation, enrichment, and partitioned storage
Language: Python - Size: 1.03 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

sjrusso8/spark-connect-rs
Apache Spark Connect Client for Rust
Language: Rust - Size: 3.88 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 109 - Forks: 18

letsiki/end-to-end-data-pipeline-acs
End-to-end data pipeline for the ACS dataset using Python, PySpark, PostgreSQL, and Kubernetes (Bronze / Silver / Gold architecture).
Language: Python - Size: 1.17 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

dotnet/spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Language: C# - Size: 4.87 MB - Last synced at: 24 days ago - Pushed at: about 1 month ago - Stars: 2,058 - Forks: 327

Qbeast-io/qbeast-spark
Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
Language: Scala - Size: 37.3 MB - Last synced at: 18 days ago - Pushed at: 5 months ago - Stars: 228 - Forks: 24

ploomber/jupysql Fork of catherinedevlin/ipython-sql
Better SQL in Jupyter. 📊
Language: Python - Size: 12.9 MB - Last synced at: 26 days ago - Pushed at: 3 months ago - Stars: 780 - Forks: 78

leowheeler1/HomeSales
All work for the Module 22 Challenge, UofM Data Analytics Bootcamp
Language: Jupyter Notebook - Size: 1.33 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

oeljeklaus-you/UserActionAnalyzePlatform
电商用户行为分析大数据平台
Language: Java - Size: 1.26 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 1,029 - Forks: 386

zsvoboda/ngods-stocks
New Generation Opensource Data Stack Demo
Language: Jupyter Notebook - Size: 22.1 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 432 - Forks: 101

qubole/sparklens
Qubole Sparklens tool for performance tuning Apache Spark
Language: Scala - Size: 175 KB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 575 - Forks: 142

majchandra/covid19-data-analysis-spark
Projet Big Data & Machine Learning avec PySpark : analyse et clustering des cas COVID-19 dans le monde (2020–2023).
Language: Jupyter Notebook - Size: 5.35 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

rizkipragustono/data_analysis_spark
Exploration: Data Analysis using Spark
Language: Jupyter Notebook - Size: 8.79 KB - Last synced at: 27 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

getredash/redash
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Language: Python - Size: 27.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 27,327 - Forks: 4,471

ludreinsalvador/gadgets-product-6850-model
Created a machine learning model that predicts whether a customer will purchase Product 6850 in January 2019 with an accuracy above 85% and a recall of at least 70%.
Language: Jupyter Notebook - Size: 4.88 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

lmouhib/auto-register-spark-ui-k8s
A lightweight operator to automatically expose Spark UI manage its ingress when running Spark on Kubernetes
Language: Go - Size: 3.67 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 3 - Forks: 0

indix/sparkplug
Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌
Language: Scala - Size: 503 KB - Last synced at: 21 days ago - Pushed at: about 5 years ago - Stars: 29 - Forks: 2

cuebook/cuelake
Use SQL to build ELT pipelines on a data lakehouse.
Language: JavaScript - Size: 28 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 288 - Forks: 28

streamnative/awesome-pulsar
A curated list of Pulsar tools, integrations and resources.
Size: 11.7 KB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 81 - Forks: 9

Neelka96/Home_Sales
DataViz Module 22 Big Data Challenge - Spark
Language: Jupyter Notebook - Size: 28.3 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

igopalakrishna/nyc-subway-foot-traffic-prediction-and-forecasting
Designed and implemented a scalable real-time analytics pipeline using Apache Kafka, Spark Structured Streaming, and MongoDB to simulate NYC MTA turnstile data and forecast real-time subway foot traffic using SparkML Random Forest models.
Language: Jupyter Notebook - Size: 1.27 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 1

hazecodeio/spark-sandbox
Language: Scala - Size: 13.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

mananghetia/Healthcare-RCM
Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

jlsilva01/spark-delta
Projeto desenvolvido para demonstração do Apache Spark Local gravando arquivos no formato Delta Lake também de forma local.
Language: Jupyter Notebook - Size: 144 KB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

taylorteixeira/APACHE-SPARK-COM-DELTA-LAKE-E-APACHE-ICEBERG
Projeto desenvolvido para demonstração do Apache Spark Local (pyspark) gravando arquivos no formato Delta Lake também de forma local criando modelo ER, imagens e códigos DDL - e da fonte de dados utilizada (dados públicos) e evidenciando e explicando.
Language: Jupyter Notebook - Size: 62.5 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 1

microsoft/data-accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Language: C# - Size: 401 MB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 302 - Forks: 90

kriss024/Spark
Spark for Data Science and ETL process.
Language: Jupyter Notebook - Size: 78 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

fbarffmann/Home_Sales
Analyzed 25,000+ home sales using PySpark and SparkSQL. Identified pricing trends by year built, home features, and view rating. Optimized query run-time by 70% using caching.
Language: Jupyter Notebook - Size: 2.48 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

kevinschaich/pyspark-cheatsheet
🐍 Quick reference guide to common patterns & functions in PySpark.
Size: 49.8 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 519 - Forks: 167

zsvoboda/ngods
New generation opensource data stack
Language: Dockerfile - Size: 1.62 MB - Last synced at: 2 months ago - Pushed at: about 3 years ago - Stars: 66 - Forks: 9

dbiir/paraflow
A real-time analytical system for ID-associated data
Language: Java - Size: 19.1 MB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 38 - Forks: 24

izhangzhihao/Real-time-Data-Warehouse
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
Language: Dockerfile - Size: 106 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 113 - Forks: 44

jaceklaskowski/spark-workshop
Apache Spark™ and Scala Workshops
Language: HTML - Size: 57 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 264 - Forks: 148

kgelli/PySpark-Fundamentals
A comprehensive collection of PySpark fundamentals with practical examples using retail and Formula 1 datasets.
Language: Jupyter Notebook - Size: 277 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

mc2-project/opaque-sql
An encrypted data analytics platform
Language: Scala - Size: 18 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 182 - Forks: 73

camilesing/Hive-Spark-SQL-Helper-VSCode
Hive & Spark SQL extension for Visual Studio Code
Language: TypeScript - Size: 7.08 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 0

Wh1isper/sparglim
Sparglim✨ makes PySpark App Configurable and Deploy Spark Connect Server Easier!
Language: Python - Size: 151 KB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 37 - Forks: 4

LearningJournal/SparkProgrammingInScala
Apache Spark Course Material
Language: Scala - Size: 50.9 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 88 - Forks: 159

fabiogouw/spark-aws-messaging
A custom sink provider for Apache Spark that sends the content of a dataframe to an AWS SQS
Language: Java - Size: 881 KB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 23 - Forks: 5

pregismond/data-analysis-using-spark
Data Analysis using Spark
Language: Jupyter Notebook - Size: 20.5 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

imjuliengaupin/sparkler
Language: Java - Size: 33.2 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

miltiadiss/CEID_NE4348-Big-Data-Management-Systems
This project implements a real-time data pipeline with Kafka, Spark, and MongoDB. It generates vehicle data using UXSIM, streams it to a Kafka broker, processes it with Spark, and stores raw and processed data in MongoDB. Queries analyze vehicle counts, speeds, and routes over specified periods.
Language: Python - Size: 4.88 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

GADES-DATAENG/mod3-spark
A repository containing all the necessary code and resources for an Apache Spark demo, showcasing data processing and transformation workflows with practical examples.
Language: Jupyter Notebook - Size: 419 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Bezid4n/Big-Data
Big Data Project such as Spark core, Spark Sql, ...
Language: Jupyter Notebook - Size: 26.9 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Sharkb8t/Home_sales
Demonstrate my ability to use SparkSQL to determine key metrics about home sales data. I've accomplished this by using Spark to create temporary views, partition the data, cache and un-cache a temporary table, and verify that the table has been un-cached.
Language: Jupyter Notebook - Size: 3.91 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Tirth27/Real-time-analytics-with-spark-streaming
This project aims to build a streaming application to perform real-time analytics of Covid-19 related tweets and deploy an ML model for real-time sentiment predictions.
Language: Jupyter Notebook - Size: 16.5 MB - Last synced at: 3 months ago - Pushed at: almost 4 years ago - Stars: 13 - Forks: 3

venkatakamaiah46/SQL
Interesting Queries Written in Structured Query Language
Size: 5.86 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

KelvynAmaral/Data_manipulation_spark
Repositório de treinamento para manipulação de dados no Apache Spark. Contém exemplos práticos de leitura, escrita, transformações, filtros, agregações, junções e uso de SQL em DataFrames.
Language: Jupyter Notebook - Size: 371 MB - Last synced at: 6 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Chabane/bigdata-playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Language: TypeScript - Size: 3.08 MB - Last synced at: about 2 months ago - Pushed at: over 6 years ago - Stars: 209 - Forks: 74

DemolisherAA/Fradulent_Data_Detection_Apache
This repository contains Jupyter Notebooks related to fraud detection, data streaming, and real-time data visualization. These notebooks cover various aspects of processing, analyzing, and modeling data to address fraudulent transactions in eCommerce and other contexts.
Language: Jupyter Notebook - Size: 739 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

huangyueranbbc/SparkDemo
spark全示例代码(java、scala) Spark most full instance code DEMO (java、scala)
Language: Java - Size: 2.33 MB - Last synced at: 3 months ago - Pushed at: about 5 years ago - Stars: 83 - Forks: 70

imsanjoykb/PySpark-Bootcamp
My Practice and project on PySpark
Language: Jupyter Notebook - Size: 4.52 MB - Last synced at: 3 months ago - Pushed at: almost 4 years ago - Stars: 8 - Forks: 3

LearningJournal/Spark-Streaming-In-Scala
Apache Spark 3 - Structured Streaming Course Material
Language: Scala - Size: 19.4 MB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 45 - Forks: 77

kathisnehith/NYC311-requests-ETL-pipeline
The project of end to end ETL pipeline processing NYC 311 service request through API for analysis.
Language: Jupyter Notebook - Size: 2.64 MB - Last synced at: 26 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

giuseppericcio/BigData
Svolgimento degli homeworks assegnati nell'ambito del corso di Big Data Engineering del prof. Vincenzo Moscato, Università degli Studi di Napoli "Federico II", a.a. 2022-23
Language: Jupyter Notebook - Size: 58.7 MB - Last synced at: about 23 hours ago - Pushed at: 4 months ago - Stars: 1 - Forks: 2

Yeisson8A/DataFramesPySpark
Ejemplo de interacción con DataFrames (A partir de una lista, un CSV, un JSON y un archivo Parquet) en Spark utilizando tanto PySpark como Spark SQL
Language: Jupyter Notebook - Size: 864 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

jgperrin/net.jgp.books.spark.ch11
Spark in Action, 2nd edition - chapter 11 - Working with SQL
Language: Java - Size: 108 KB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 15 - Forks: 11

mohankrishna02/interview-scenerios-spark-sql
This repository focuses on providing interview scenario questions that I have encountered during interviews. The questions are designed to simulate real-world scenarios and test your problem-solving and technical skills. By exploring these scenarios, you can gain insights into common interview topics and prepare yourself for similar challenges.
Language: Scala - Size: 353 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 13 - Forks: 20

qwshen/spark-etl-framework
A generic ETL framework with Spark_SQL for transforming data by constructing pipelines with Yaml/Json/Xml
Language: Scala - Size: 890 KB - Last synced at: 5 months ago - Pushed at: 6 months ago - Stars: 11 - Forks: 9

RubyNixx/coding_resources
Range of cheat sheets, coding resources, videos, etc that I want to keep track of & others may find helpful.
Size: 61.5 KB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

ExperienceIsKey/Video-Game-Genre-Trend-Analysis
Leveraged AWS, PySpark, and Power BI to analyze trends in PC video game genres. Optimized ETL processes and utilized datasets and the Steam API to reveal nuanced genre frequencies and distributions. Delivered insights driving decisions in game development, marketing, and platform enhancement.
Language: Python - Size: 10.9 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

asuiu/SparkORM
ORM for Apache Spark and DataFrames schema manager
Language: Python - Size: 482 KB - Last synced at: 17 days ago - Pushed at: about 1 year ago - Stars: 14 - Forks: 3

Dirkster99/PyNotes
My notebook on using Python with Jupyter Notebook, PySpark etc
Language: Jupyter Notebook - Size: 84.6 MB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 11 - Forks: 7

RinatVeliakhmedov/spark_event_log_analyzer
Check for common Spark errors by analyzing event log files
Language: Scala - Size: 20.5 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

tuancamtbtx/etl-spark-k8s
ETL With Apache Spark Deployed on K8s
Language: TypeScript - Size: 5.54 MB - Last synced at: 4 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

goamegah/spark-handson
Spark hands-on
Language: Python - Size: 3.08 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 1

Thanaraklee/Real-Time-PySpark
This project introduces PySpark, a powerful open-source framework for distributed data processing. We explore its architecture, components, and applications for real-time data analysis.
Language: Python - Size: 329 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 27 - Forks: 13

syedhassaanahmed/databricks-notebooks
Collection of Databricks and Jupyter Notebooks
Language: Jupyter Notebook - Size: 742 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 21 - Forks: 15

kayvansol/PySparkJupyterOnKubernetes
PySpark & Jupyter Notebooks Deployed On Kubernetes
Size: 611 KB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

harryprince/awesome-sparklyr
An awesome sparklyr related package collection
Size: 47.9 KB - Last synced at: 12 days ago - Pushed at: over 5 years ago - Stars: 42 - Forks: 7

thaychansy/home-sales
This project involves Big Data analysis of real estate transaction data using SparkSQL/PySpark. The goal is to derive insights on housing prices based on various criteria. This analysis showcases the powerful data processing capabilities of SparkSQL/PySpark in handling large datasets efficiently.
Language: Jupyter Notebook - Size: 45.9 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Foufou-exe/openfoodfacts-etl
This project aims to set up a distributed ETL (Extract, Transform, Load) solution to randomly generate food menus tailored to users' needs, using data available on OpenFoodFacts.
Language: Java - Size: 3.71 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

CloudFormations/Training.ApacheSpark
Training content for course delegates.
Size: 3.4 MB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

vskvj3/End-to-End-Pipeline-for-Swiggy-Restaurant-Data
This project develops a data engineering pipeline to analyze restaurant data from various cities on the Swiggy platform. Using PySpark, Spark SQL, and Azure Data Factory, the data is processed and transformed to generate insights on ratings, cuisines, and trends, presented through dashboards and reports.
Language: Jupyter Notebook - Size: 18.3 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

polomarcus/Spark-Structured-Streaming-Examples
Spark Structured Streaming / Kafka / Cassandra / Elastic
Language: Scala - Size: 16.5 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 183 - Forks: 78

LegallyNotBlonde/MegaData_SparkProject_Home_Sales
This project used SparkSQL and PySpark to analyze home sales data, optimizing performance with caching and partitioning by build date.
Language: Jupyter Notebook - Size: 27.3 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

lucas-nelson-uiuc/tidy_tools
Declarative programming for PySpark workflows.
Language: Python - Size: 1.67 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

harryprince/geospark
bring sf to spark in production
Language: R - Size: 15.9 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 57 - Forks: 17

varunu28/AADHAR-Dataset-Analysis
Data analysis of AADHAR dataset using Apache Spark
Language: Scala - Size: 1.82 MB - Last synced at: 2 months ago - Pushed at: about 7 years ago - Stars: 7 - Forks: 9

abouslimi/spark-ml-product-recommendation
Real-time product recommendation system built using Apache Spark, Kafka, and Python.
Language: Python - Size: 419 KB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Ubikitina/Spark-Essentials
A guide to Apache Spark, from fundamentals to advanced concepts.
Language: Jupyter Notebook - Size: 57.7 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

mr-pratyush/DataBricks-Employee-Attrition-Analysis
Analyze employee attrition using Databricks and Spark SQL to identify trends and actionable insights.
Size: 0 Bytes - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

rahulray11/Spark-Java-Application
How to use spark testing base in spark java application. Feel free to make changes.
Language: Java - Size: 44.9 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 2

roshankoirala/pySpark_tutorial
Implementation of Spark code in Jupyter notebook. Topics include: RDDs and DataFrame, exploratory data analysis (EDA), handling multiple DataFrames, visualization, Machine Learning
Language: Jupyter Notebook - Size: 202 KB - Last synced at: 3 months ago - Pushed at: almost 5 years ago - Stars: 29 - Forks: 26

purcellcjp/Home_Sales
This project demonstrated the usage of SparkSQL to read, query, cache, and analyze home sales data, providing insights into average prices based on various criteria.
Language: Jupyter Notebook - Size: 684 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0
