An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: spark-sql

streamnative/pulsar-spark

Spark Connector to read and write with Pulsar

Language: Scala - Size: 703 KB - Last synced at: about 18 hours ago - Pushed at: about 19 hours ago - Stars: 113 - Forks: 51

shreyasaxena29103/tmdb-movie-analysis-and-optimization

This project explores the TMDB movie dataset (900K+ records) using PySpark and Spark SQL in Databricks. It focuses on performance optimization and analysis

Size: 2.93 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

japila-books/spark-sql-internals

The Internals of Spark SQL

Size: 1.46 GB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 468 - Forks: 132

apache/kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

Language: Scala - Size: 60.1 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2,207 - Forks: 947

s-yazhini/Hexa-DE-Main-Project

Data engineering main project 1

Language: Jupyter Notebook - Size: 15.5 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

apache/incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.

Language: Scala - Size: 199 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,373 - Forks: 552

databricks/LearningSparkV2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

Language: Scala - Size: 75.2 MB - Last synced at: 4 days ago - Pushed at: 5 months ago - Stars: 1,306 - Forks: 775

goamegah/flowtrack

End-To-End Real-time Road Traffic Monitoring Spark Structured Streaming solution

Language: Scala - Size: 51 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

Sabal999/end-to-end-data-pipeline-acs

This repository showcases a robust end-to-end data pipeline for the American Community Survey dataset, utilizing tools like Python, SparkSQL, and Docker. 🚀 Explore the architecture that transforms raw data into valuable insights through a Bronze / Silver / Gold framework. 🐙

Language: Python - Size: 1.17 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

groda/big_data

Big Data essentials: Hadoop, MapReduce, Spark. Explore tutorials and demos in Jupyter notebooks—most are self-contained and live, ready to run with a click.

Language: Jupyter Notebook - Size: 51.9 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 78 - Forks: 27

anaregdesign/openaivec

Pandas extension, Tabular calculation with LLM, Spark UDF Builder

Language: Python - Size: 977 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 13 - Forks: 1

AlexRogalskiy/spark-patterns

🏆 Spark4You Design patterns

Language: Shell - Size: 20.4 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 3 - Forks: 0

xReniar/US-used-cars-analysis

US used cars analysis with Map-Reduce, Hive, Spark core and Spark SQL

Language: Python - Size: 2.25 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

dejarol/azure-search-spark

Azure AI Search connector for Spark

Language: Scala - Size: 1.47 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

almond-sh/almond

A Scala kernel for Jupyter

Language: Scala - Size: 12.8 MB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 1,615 - Forks: 251

minio/spark-select

A library for Spark DataFrame using MinIO Select API

Language: Scala - Size: 65.4 KB - Last synced at: 5 days ago - Pushed at: over 5 years ago - Stars: 98 - Forks: 19

Sushmi08B/NYC-YELLOWTAXI-DATA-ETL

Production-scale ETL pipeline using PySpark and PostgreSQL to process 100M+ NYC Yellow Taxi trip records with full data validation, enrichment, and partitioned storage

Language: Python - Size: 1.03 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

sjrusso8/spark-connect-rs

Apache Spark Connect Client for Rust

Language: Rust - Size: 3.88 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 109 - Forks: 18

letsiki/end-to-end-data-pipeline-acs

End-to-end data pipeline for the ACS dataset using Python, PySpark, PostgreSQL, and Kubernetes (Bronze / Silver / Gold architecture).

Language: Python - Size: 1.17 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

dotnet/spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Language: C# - Size: 4.87 MB - Last synced at: 24 days ago - Pushed at: about 1 month ago - Stars: 2,058 - Forks: 327

Qbeast-io/qbeast-spark

Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!

Language: Scala - Size: 37.3 MB - Last synced at: 18 days ago - Pushed at: 5 months ago - Stars: 228 - Forks: 24

ploomber/jupysql Fork of catherinedevlin/ipython-sql

Better SQL in Jupyter. 📊

Language: Python - Size: 12.9 MB - Last synced at: 26 days ago - Pushed at: 3 months ago - Stars: 780 - Forks: 78

leowheeler1/HomeSales

All work for the Module 22 Challenge, UofM Data Analytics Bootcamp

Language: Jupyter Notebook - Size: 1.33 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

oeljeklaus-you/UserActionAnalyzePlatform

电商用户行为分析大数据平台

Language: Java - Size: 1.26 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 1,029 - Forks: 386

zsvoboda/ngods-stocks

New Generation Opensource Data Stack Demo

Language: Jupyter Notebook - Size: 22.1 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 432 - Forks: 101

qubole/sparklens

Qubole Sparklens tool for performance tuning Apache Spark

Language: Scala - Size: 175 KB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 575 - Forks: 142

majchandra/covid19-data-analysis-spark

Projet Big Data & Machine Learning avec PySpark : analyse et clustering des cas COVID-19 dans le monde (2020–2023).

Language: Jupyter Notebook - Size: 5.35 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

rizkipragustono/data_analysis_spark

Exploration: Data Analysis using Spark

Language: Jupyter Notebook - Size: 8.79 KB - Last synced at: 27 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

getredash/redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

Language: Python - Size: 27.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 27,327 - Forks: 4,471

ludreinsalvador/gadgets-product-6850-model

Created a machine learning model that predicts whether a customer will purchase Product 6850 in January 2019 with an accuracy above 85% and a recall of at least 70%.

Language: Jupyter Notebook - Size: 4.88 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

lmouhib/auto-register-spark-ui-k8s

A lightweight operator to automatically expose Spark UI manage its ingress when running Spark on Kubernetes

Language: Go - Size: 3.67 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 3 - Forks: 0

indix/sparkplug

Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌

Language: Scala - Size: 503 KB - Last synced at: 21 days ago - Pushed at: about 5 years ago - Stars: 29 - Forks: 2

cuebook/cuelake

Use SQL to build ELT pipelines on a data lakehouse.

Language: JavaScript - Size: 28 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 288 - Forks: 28

streamnative/awesome-pulsar

A curated list of Pulsar tools, integrations and resources.

Size: 11.7 KB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 81 - Forks: 9

Neelka96/Home_Sales

DataViz Module 22 Big Data Challenge - Spark

Language: Jupyter Notebook - Size: 28.3 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

igopalakrishna/nyc-subway-foot-traffic-prediction-and-forecasting

Designed and implemented a scalable real-time analytics pipeline using Apache Kafka, Spark Structured Streaming, and MongoDB to simulate NYC MTA turnstile data and forecast real-time subway foot traffic using SparkML Random Forest models.

Language: Jupyter Notebook - Size: 1.27 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 1

hazecodeio/spark-sandbox

Language: Scala - Size: 13.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

mananghetia/Healthcare-RCM

Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

jlsilva01/spark-delta

Projeto desenvolvido para demonstração do Apache Spark Local gravando arquivos no formato Delta Lake também de forma local.

Language: Jupyter Notebook - Size: 144 KB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

taylorteixeira/APACHE-SPARK-COM-DELTA-LAKE-E-APACHE-ICEBERG

Projeto desenvolvido para demonstração do Apache Spark Local (pyspark) gravando arquivos no formato Delta Lake também de forma local criando modelo ER, imagens e códigos DDL - e da fonte de dados utilizada (dados públicos) e evidenciando e explicando.

Language: Jupyter Notebook - Size: 62.5 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 1

microsoft/data-accelerator

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Language: C# - Size: 401 MB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 302 - Forks: 90

kriss024/Spark

Spark for Data Science and ETL process.

Language: Jupyter Notebook - Size: 78 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

fbarffmann/Home_Sales

Analyzed 25,000+ home sales using PySpark and SparkSQL. Identified pricing trends by year built, home features, and view rating. Optimized query run-time by 70% using caching.

Language: Jupyter Notebook - Size: 2.48 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

kevinschaich/pyspark-cheatsheet

🐍 Quick reference guide to common patterns & functions in PySpark.

Size: 49.8 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 519 - Forks: 167

zsvoboda/ngods

New generation opensource data stack

Language: Dockerfile - Size: 1.62 MB - Last synced at: 2 months ago - Pushed at: about 3 years ago - Stars: 66 - Forks: 9

dbiir/paraflow

A real-time analytical system for ID-associated data

Language: Java - Size: 19.1 MB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 38 - Forks: 24

izhangzhihao/Real-time-Data-Warehouse

Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi

Language: Dockerfile - Size: 106 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 113 - Forks: 44

jaceklaskowski/spark-workshop

Apache Spark™ and Scala Workshops

Language: HTML - Size: 57 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 264 - Forks: 148

kgelli/PySpark-Fundamentals

A comprehensive collection of PySpark fundamentals with practical examples using retail and Formula 1 datasets.

Language: Jupyter Notebook - Size: 277 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

mc2-project/opaque-sql

An encrypted data analytics platform

Language: Scala - Size: 18 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 182 - Forks: 73

camilesing/Hive-Spark-SQL-Helper-VSCode

Hive & Spark SQL extension for Visual Studio Code

Language: TypeScript - Size: 7.08 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 0

Wh1isper/sparglim

Sparglim✨ makes PySpark App Configurable and Deploy Spark Connect Server Easier!

Language: Python - Size: 151 KB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 37 - Forks: 4

LearningJournal/SparkProgrammingInScala

Apache Spark Course Material

Language: Scala - Size: 50.9 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 88 - Forks: 159

fabiogouw/spark-aws-messaging

A custom sink provider for Apache Spark that sends the content of a dataframe to an AWS SQS

Language: Java - Size: 881 KB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 23 - Forks: 5

pregismond/data-analysis-using-spark

Data Analysis using Spark

Language: Jupyter Notebook - Size: 20.5 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

imjuliengaupin/sparkler

Language: Java - Size: 33.2 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

miltiadiss/CEID_NE4348-Big-Data-Management-Systems

This project implements a real-time data pipeline with Kafka, Spark, and MongoDB. It generates vehicle data using UXSIM, streams it to a Kafka broker, processes it with Spark, and stores raw and processed data in MongoDB. Queries analyze vehicle counts, speeds, and routes over specified periods.

Language: Python - Size: 4.88 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

GADES-DATAENG/mod3-spark

A repository containing all the necessary code and resources for an Apache Spark demo, showcasing data processing and transformation workflows with practical examples.

Language: Jupyter Notebook - Size: 419 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Bezid4n/Big-Data

Big Data Project such as Spark core, Spark Sql, ...

Language: Jupyter Notebook - Size: 26.9 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Sharkb8t/Home_sales

Demonstrate my ability to use SparkSQL to determine key metrics about home sales data. I've accomplished this by using Spark to create temporary views, partition the data, cache and un-cache a temporary table, and verify that the table has been un-cached.

Language: Jupyter Notebook - Size: 3.91 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Tirth27/Real-time-analytics-with-spark-streaming

This project aims to build a streaming application to perform real-time analytics of Covid-19 related tweets and deploy an ML model for real-time sentiment predictions.

Language: Jupyter Notebook - Size: 16.5 MB - Last synced at: 3 months ago - Pushed at: almost 4 years ago - Stars: 13 - Forks: 3

venkatakamaiah46/SQL

Interesting Queries Written in Structured Query Language

Size: 5.86 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

KelvynAmaral/Data_manipulation_spark

Repositório de treinamento para manipulação de dados no Apache Spark. Contém exemplos práticos de leitura, escrita, transformações, filtros, agregações, junções e uso de SQL em DataFrames.

Language: Jupyter Notebook - Size: 371 MB - Last synced at: 6 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Chabane/bigdata-playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Language: TypeScript - Size: 3.08 MB - Last synced at: about 2 months ago - Pushed at: over 6 years ago - Stars: 209 - Forks: 74

DemolisherAA/Fradulent_Data_Detection_Apache

This repository contains Jupyter Notebooks related to fraud detection, data streaming, and real-time data visualization. These notebooks cover various aspects of processing, analyzing, and modeling data to address fraudulent transactions in eCommerce and other contexts.

Language: Jupyter Notebook - Size: 739 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

huangyueranbbc/SparkDemo

spark全示例代码(java、scala) Spark most full instance code DEMO (java、scala)

Language: Java - Size: 2.33 MB - Last synced at: 3 months ago - Pushed at: about 5 years ago - Stars: 83 - Forks: 70

imsanjoykb/PySpark-Bootcamp

My Practice and project on PySpark

Language: Jupyter Notebook - Size: 4.52 MB - Last synced at: 3 months ago - Pushed at: almost 4 years ago - Stars: 8 - Forks: 3

LearningJournal/Spark-Streaming-In-Scala

Apache Spark 3 - Structured Streaming Course Material

Language: Scala - Size: 19.4 MB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 45 - Forks: 77

kathisnehith/NYC311-requests-ETL-pipeline

The project of end to end ETL pipeline processing NYC 311 service request through API for analysis.

Language: Jupyter Notebook - Size: 2.64 MB - Last synced at: 26 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

giuseppericcio/BigData

Svolgimento degli homeworks assegnati nell'ambito del corso di Big Data Engineering del prof. Vincenzo Moscato, Università degli Studi di Napoli "Federico II", a.a. 2022-23

Language: Jupyter Notebook - Size: 58.7 MB - Last synced at: about 23 hours ago - Pushed at: 4 months ago - Stars: 1 - Forks: 2

Yeisson8A/DataFramesPySpark

Ejemplo de interacción con DataFrames (A partir de una lista, un CSV, un JSON y un archivo Parquet) en Spark utilizando tanto PySpark como Spark SQL

Language: Jupyter Notebook - Size: 864 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

jgperrin/net.jgp.books.spark.ch11

Spark in Action, 2nd edition - chapter 11 - Working with SQL

Language: Java - Size: 108 KB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 15 - Forks: 11

mohankrishna02/interview-scenerios-spark-sql

This repository focuses on providing interview scenario questions that I have encountered during interviews. The questions are designed to simulate real-world scenarios and test your problem-solving and technical skills. By exploring these scenarios, you can gain insights into common interview topics and prepare yourself for similar challenges.

Language: Scala - Size: 353 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 13 - Forks: 20

qwshen/spark-etl-framework

A generic ETL framework with Spark_SQL for transforming data by constructing pipelines with Yaml/Json/Xml

Language: Scala - Size: 890 KB - Last synced at: 5 months ago - Pushed at: 6 months ago - Stars: 11 - Forks: 9

RubyNixx/coding_resources

Range of cheat sheets, coding resources, videos, etc that I want to keep track of & others may find helpful.

Size: 61.5 KB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

ExperienceIsKey/Video-Game-Genre-Trend-Analysis

Leveraged AWS, PySpark, and Power BI to analyze trends in PC video game genres. Optimized ETL processes and utilized datasets and the Steam API to reveal nuanced genre frequencies and distributions. Delivered insights driving decisions in game development, marketing, and platform enhancement.

Language: Python - Size: 10.9 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

asuiu/SparkORM

ORM for Apache Spark and DataFrames schema manager

Language: Python - Size: 482 KB - Last synced at: 17 days ago - Pushed at: about 1 year ago - Stars: 14 - Forks: 3

Dirkster99/PyNotes

My notebook on using Python with Jupyter Notebook, PySpark etc

Language: Jupyter Notebook - Size: 84.6 MB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 11 - Forks: 7

RinatVeliakhmedov/spark_event_log_analyzer

Check for common Spark errors by analyzing event log files

Language: Scala - Size: 20.5 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

tuancamtbtx/etl-spark-k8s

ETL With Apache Spark Deployed on K8s

Language: TypeScript - Size: 5.54 MB - Last synced at: 4 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

goamegah/spark-handson

Spark hands-on

Language: Python - Size: 3.08 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 1

Thanaraklee/Real-Time-PySpark

This project introduces PySpark, a powerful open-source framework for distributed data processing. We explore its architecture, components, and applications for real-time data analysis.

Language: Python - Size: 329 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 27 - Forks: 13

syedhassaanahmed/databricks-notebooks

Collection of Databricks and Jupyter Notebooks

Language: Jupyter Notebook - Size: 742 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 21 - Forks: 15

kayvansol/PySparkJupyterOnKubernetes

PySpark & Jupyter Notebooks Deployed On Kubernetes

Size: 611 KB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

harryprince/awesome-sparklyr

An awesome sparklyr related package collection

Size: 47.9 KB - Last synced at: 12 days ago - Pushed at: over 5 years ago - Stars: 42 - Forks: 7

thaychansy/home-sales

This project involves Big Data analysis of real estate transaction data using SparkSQL/PySpark. The goal is to derive insights on housing prices based on various criteria. This analysis showcases the powerful data processing capabilities of SparkSQL/PySpark in handling large datasets efficiently.

Language: Jupyter Notebook - Size: 45.9 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Foufou-exe/openfoodfacts-etl

This project aims to set up a distributed ETL (Extract, Transform, Load) solution to randomly generate food menus tailored to users' needs, using data available on OpenFoodFacts.

Language: Java - Size: 3.71 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

CloudFormations/Training.ApacheSpark

Training content for course delegates.

Size: 3.4 MB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

vskvj3/End-to-End-Pipeline-for-Swiggy-Restaurant-Data

This project develops a data engineering pipeline to analyze restaurant data from various cities on the Swiggy platform. Using PySpark, Spark SQL, and Azure Data Factory, the data is processed and transformed to generate insights on ratings, cuisines, and trends, presented through dashboards and reports.

Language: Jupyter Notebook - Size: 18.3 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

polomarcus/Spark-Structured-Streaming-Examples

Spark Structured Streaming / Kafka / Cassandra / Elastic

Language: Scala - Size: 16.5 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 183 - Forks: 78

LegallyNotBlonde/MegaData_SparkProject_Home_Sales

This project used SparkSQL and PySpark to analyze home sales data, optimizing performance with caching and partitioning by build date.

Language: Jupyter Notebook - Size: 27.3 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

lucas-nelson-uiuc/tidy_tools

Declarative programming for PySpark workflows.

Language: Python - Size: 1.67 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

harryprince/geospark

bring sf to spark in production

Language: R - Size: 15.9 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 57 - Forks: 17

varunu28/AADHAR-Dataset-Analysis

Data analysis of AADHAR dataset using Apache Spark

Language: Scala - Size: 1.82 MB - Last synced at: 2 months ago - Pushed at: about 7 years ago - Stars: 7 - Forks: 9

abouslimi/spark-ml-product-recommendation

Real-time product recommendation system built using Apache Spark, Kafka, and Python.

Language: Python - Size: 419 KB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Ubikitina/Spark-Essentials

A guide to Apache Spark, from fundamentals to advanced concepts.

Language: Jupyter Notebook - Size: 57.7 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

mr-pratyush/DataBricks-Employee-Attrition-Analysis

Analyze employee attrition using Databricks and Spark SQL to identify trends and actionable insights.

Size: 0 Bytes - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

rahulray11/Spark-Java-Application

How to use spark testing base in spark java application. Feel free to make changes.

Language: Java - Size: 44.9 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 2

roshankoirala/pySpark_tutorial

Implementation of Spark code in Jupyter notebook. Topics include: RDDs and DataFrame, exploratory data analysis (EDA), handling multiple DataFrames, visualization, Machine Learning

Language: Jupyter Notebook - Size: 202 KB - Last synced at: 3 months ago - Pushed at: almost 5 years ago - Stars: 29 - Forks: 26

purcellcjp/Home_Sales

This project demonstrated the usage of SparkSQL to read, query, cache, and analyze home sales data, providing insights into average prices based on various criteria.

Language: Jupyter Notebook - Size: 684 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0