Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: spark-sql

getredash/redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

Language: Python - Size: 25 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 25,079 - Forks: 4,262

sjrusso8/spark-connect-rs

Apache Spark Connect Client for Rust

Language: Rust - Size: 4.53 MB - Last synced: 2 days ago - Pushed: 3 days ago - Stars: 30 - Forks: 9

almond-sh/almond

A Scala kernel for Jupyter

Language: Scala - Size: 12.3 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 1,568 - Forks: 240

asuiu/SparkORM

ORM for Apache Spark and DataFrames schema manager

Language: Python - Size: 438 KB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 10 - Forks: 3

ExperienceIsKey/Game-Genre-Trends-Analysis

Leveraged AWS, PySpark, and Power BI to analyze trends in PC video game genres. Optimized ETL processes and utilized datasets and the Steam API to reveal nuanced genre frequencies and distributions. Delivered insights driving decisions in game development, marketing, and platform enhancement.

Language: Python - Size: 10.9 MB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 0 - Forks: 0

japila-books/spark-sql-internals

The Internals of Spark SQL

Size: 1.42 GB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 439 - Forks: 128

masalinas/doc-spark-minikube Fork of testdrivenio/spark-kubernetes

DoC Spark on minikube from Mac with Docker Desktop

Language: Shell - Size: 636 KB - Last synced: 9 days ago - Pushed: 9 days ago - Stars: 0 - Forks: 0

dongma/spark-graphx

spark graphx which is designed for distributed graph calculate, including spark-sql spark-streaming and RDD operations

Language: Scala - Size: 15.4 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 3 - Forks: 4

kaladabrio2020/pyspark-ml-analysis-data

Analises de Dados e machine learning com o Pyspark

Language: Jupyter Notebook - Size: 1.95 KB - Last synced: 10 days ago - Pushed: 10 days ago - Stars: 0 - Forks: 0

Qbeast-io/qbeast-spark

Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!

Language: Scala - Size: 36.9 MB - Last synced: 8 days ago - Pushed: 11 days ago - Stars: 197 - Forks: 17

groda/big_data

Tutorials on Big Data essentials: Hadoop, MapReduce, Spark.

Language: Jupyter Notebook - Size: 46.2 MB - Last synced: 10 days ago - Pushed: 11 days ago - Stars: 61 - Forks: 23

microsoft/MCW-Big-data-analytics-and-visualization 📦

MCW Big data analytics and visualization

Language: JavaScript - Size: 148 MB - Last synced: 11 days ago - Pushed: almost 2 years ago - Stars: 189 - Forks: 186

essraahmed/Data-Lake-with-Spark

Data Lake with Spark

Language: Python - Size: 37.1 KB - Last synced: 16 days ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

abulbasar/zeppelin-notebooks

Size: 3.91 KB - Last synced: 17 days ago - Pushed: over 6 years ago - Stars: 1 - Forks: 1

mohankrishna02/interview-scenerios-spark-sql

This repository focuses on providing interview scenario questions that I have encountered during interviews. The questions are designed to simulate real-world scenarios and test your problem-solving and technical skills. By exploring these scenarios, you can gain insights into common interview topics and prepare yourself for similar challenges.

Language: Scala - Size: 249 KB - Last synced: 17 days ago - Pushed: 18 days ago - Stars: 0 - Forks: 2

venkatakamaiah46/SQL

Interesting Queries Written in Structured Query Language

Size: 3.91 KB - Last synced: 18 days ago - Pushed: 18 days ago - Stars: 2 - Forks: 0

pregismond/data-analysis-using-spark

Final Project Submission: Data Analysis using Spark

Language: Jupyter Notebook - Size: 20.5 KB - Last synced: 18 days ago - Pushed: 18 days ago - Stars: 0 - Forks: 0

Ashutosh27ind/pySparkNYCParkingTickets

Attempt to scientifically analyze the phenomenon of increased traffic violation tickets issued by the NYC Police Department.

Language: Jupyter Notebook - Size: 11.7 KB - Last synced: 19 days ago - Pushed: about 4 years ago - Stars: 0 - Forks: 0

SEED-VT/DeSQL

DeSQL is an interactive step-through debugging technique for DISC-backed SQL queries. This approach allows users to inspect constituent parts of a query and their corresponding intermediate data interactively, similar to watchpoints in gdb-like debuggers.

Language: Scala - Size: 515 MB - Last synced: 18 days ago - Pushed: 20 days ago - Stars: 1 - Forks: 0

kevinschaich/pyspark-cheatsheet

🐍 Quick reference guide to common patterns & functions in PySpark.

Size: 49.8 KB - Last synced: 19 days ago - Pushed: about 1 year ago - Stars: 343 - Forks: 115

deepjyotiroy079/big-data-stack

Codes created while learning Big Data Stack.

Language: Jupyter Notebook - Size: 949 KB - Last synced: 21 days ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0

AdamPaternostro/Azure-Spark-Livy

Run a job in Spark 2.x with HDInsight and submit the job through Livy

Language: Scala - Size: 168 KB - Last synced: 23 days ago - Pushed: almost 7 years ago - Stars: 0 - Forks: 1

dotnet/spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Language: C# - Size: 2.99 MB - Last synced: 22 days ago - Pushed: about 1 month ago - Stars: 1,999 - Forks: 308

microsoft/data-accelerator

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Language: C# - Size: 378 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 293 - Forks: 87

seyfal/SparkMitMAttackSim

Scalable simulation of MitM attacks using parallel random walks and graph analytics on Spark.

Language: Scala - Size: 76.2 KB - Last synced: 26 days ago - Pushed: 3 months ago - Stars: 0 - Forks: 0

morfious902002/impala-spark-jdbc-kerberos 📦

Language: Java - Size: 4.88 KB - Last synced: 27 days ago - Pushed: almost 2 years ago - Stars: 7 - Forks: 5

iobruno/data-engineering-zoomcamp

Data Engineering examples covering Airflow and Mage for workflows; dbt for BigQuery, Redshift, ClickHouse; Spark and Kafka for Batch/Streaming Processing

Language: Python - Size: 3.91 MB - Last synced: 13 days ago - Pushed: about 1 month ago - Stars: 46 - Forks: 1

thomasDoukas/NTUA_ATDS

Advanced Topics in Database Systems course of ECE National Technical University of Athens.

Language: Python - Size: 2.2 MB - Last synced: 27 days ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

cavallon/Home_Sales

This SparkSQL project analyzes home sales data, optimizing queries and calculating average prices. Results are saved in a Jupyter Notebook and uploaded to a GitHub repository named "Home_Sales."

Language: Jupyter Notebook - Size: 187 KB - Last synced: 27 days ago - Pushed: 27 days ago - Stars: 0 - Forks: 0

airbnb/airbnb-spark-thrift

A library for loadling Thrift data into Spark SQL

Language: Scala - Size: 50.8 KB - Last synced: 12 days ago - Pushed: about 1 year ago - Stars: 43 - Forks: 16

ramkumarpj/Home_Sales

Home sales data is analyzed using SparkSQL. Spark is also used to create temporary views, partition the data, cache and uncache a temporary table, and verify that the table has been uncached.

Language: Jupyter Notebook - Size: 10.7 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0

jowilf/big-data-showcase

This repository contains a project showcasing the use of Big Data technologies in processing and visualizing real-time data from an eCommerce electronics store using tools such as Apache Kafka, Spark Streaming, Spark SQL, HBase, and Plotly

Language: Java - Size: 2.7 MB - Last synced: about 1 month ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

kriss024/data-discovery-with-pyspark

Spark for Data Science and ETL process.

Language: Jupyter Notebook - Size: 77.9 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 2 - Forks: 0

508lab/Spark-Java

Spark Java api的学习

Language: Java - Size: 12.7 KB - Last synced: about 1 month ago - Pushed: over 5 years ago - Stars: 0 - Forks: 0

flaviostutz/spark-scala-jupyter

Jupyter notebook server prepared for running Spark with Scala kernels on a remote Spark master

Language: Jupyter Notebook - Size: 1.17 MB - Last synced: 28 days ago - Pushed: about 4 years ago - Stars: 4 - Forks: 1

rodrigoorf/SparkStudies

Repo with some Spark and SparkSQL exercises

Language: Java - Size: 41.1 MB - Last synced: about 1 month ago - Pushed: over 4 years ago - Stars: 0 - Forks: 0

amita-shukla/time-usage

Analysis on how people distribute their time between primary needs, work and leisure activities.

Language: Scala - Size: 22.5 KB - Last synced: about 1 month ago - Pushed: almost 4 years ago - Stars: 1 - Forks: 0

aabdel-kader/Apache-Spark

A repository for my practices and projects using pyspark

Language: Jupyter Notebook - Size: 11.6 MB - Last synced: about 1 month ago - Pushed: over 2 years ago - Stars: 1 - Forks: 0

sakethmukkanti/Machinery-Moniter-Iot-Streaming-With-Azure

An application developed to give real-time insights on machine health using Iot sensors by tracking and monitoring parameters such as temperature, pressure, current and humidity.

Language: Jupyter Notebook - Size: 210 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0

Wh1isper/sparglim

Sparglim✨ makes PySpark App Configurable and Deploy Spark Connect Server Easier!

Language: Python - Size: 144 KB - Last synced: 21 days ago - Pushed: 21 days ago - Stars: 28 - Forks: 2

kayvansol/PySparkJupyterOnKubernetes

PySpark & Jupyter Notebooks Deployed On Kubernetes

Size: 611 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0

johngodoi/ScalaSparkKafka

This code just loads data to kafka through apache spark and reads it back.

Language: Scala - Size: 5.86 KB - Last synced: about 1 month ago - Pushed: about 3 years ago - Stars: 0 - Forks: 0

saurabhg27/dps-project

Spatial Data analysis using Spark SQL

Language: Scala - Size: 4.4 MB - Last synced: about 1 month ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

chaokunyang/bigdata-examples

bigdata examples about spark and flink

Language: Scala - Size: 50.8 KB - Last synced: about 1 month ago - Pushed: over 5 years ago - Stars: 11 - Forks: 5

apache/kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

Language: Scala - Size: 56.6 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1,919 - Forks: 852

AlexRogalskiy/spark-patterns

🏆 Spark4You Design patterns

Language: Shell - Size: 15.6 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1 - Forks: 0

apache/incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.

Language: Scala - Size: 175 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 969 - Forks: 347

kevin-lee/fuse Fork of charleso/fuse

Some utilities for interfacing with Spark without blowing a fuse

Language: Scala - Size: 45.9 KB - Last synced: about 1 month ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0

oeljeklaus-you/UserActionAnalyzePlatform

电商用户行为分析大数据平台

Language: Java - Size: 1.26 MB - Last synced: about 1 month ago - Pushed: over 1 year ago - Stars: 913 - Forks: 382

minio/spark-select

A library for Spark DataFrame using MinIO Select API

Language: Scala - Size: 65.4 KB - Last synced: about 1 month ago - Pushed: over 4 years ago - Stars: 96 - Forks: 18

IBM/db2-event-store-akka-streams

Use Akka to implement a WebSockets endpoint and stream data to Db2 Event Store

Language: Jupyter Notebook - Size: 2.39 MB - Last synced: about 1 month ago - Pushed: about 5 years ago - Stars: 9 - Forks: 11

qubole/sparklens

Qubole Sparklens tool for performance tuning Apache Spark

Language: Scala - Size: 175 KB - Last synced: about 1 month ago - Pushed: 11 months ago - Stars: 547 - Forks: 130

ploomber/jupysql Fork of catherinedevlin/ipython-sql

Better SQL in Jupyter. 📊

Language: Python - Size: 12.7 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 588 - Forks: 70

salimt/Finance-and-Risk-Management-Algorithms

applications for risk management through computational portfolio construction methods

Language: Jupyter Notebook - Size: 13.4 MB - Last synced: about 1 month ago - Pushed: over 3 years ago - Stars: 32 - Forks: 10

sarthak25/Smart-City-YVR

Smart City YVR is an innovative project leveraging data-driven methodologies to analyze and address critical aspects of urban living. Focusing on housing affordability, energy consumption, and transportation, this initiative utilizes advanced data analytics to derive actionable insights.

Language: Jupyter Notebook - Size: 109 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0

xiaoa6435/spark-abtest

a spark extensions to help analyze abtest experiments based on raw data

Language: Scala - Size: 58.6 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1 - Forks: 0

aessing/demo-azuresynapse

This repository includes the demos and codes I use to play around with Azure Synapse Anayltics

Size: 80 MB - Last synced: 19 days ago - Pushed: over 1 year ago - Stars: 5 - Forks: 5

MM24J/Home_Sales_Analysis

Using SparkSQL, I analyzed home sales data to identify key metrics.

Language: Jupyter Notebook - Size: 7.81 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0

vim89/datapipelines-essentials-python

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Language: Python - Size: 1.76 MB - Last synced: about 1 month ago - Pushed: about 1 year ago - Stars: 53 - Forks: 34

OKDP/spark-images

Collection of Apache Spark docker images for OKDP

Language: Dockerfile - Size: 78.1 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0

amy-panda/NY_Taxi_Data_Analysis_and_Modelling

Analysing the taxi trips in New York City and predicting total fare amount of taxi trips

Language: Jupyter Notebook - Size: 1.84 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1 - Forks: 0

sakethmukkanti/Demand-Navigator-Real-Time-Streaming-with-Azure

A real-time application to guide cab drivers looking for ride towards the areas of the cities experiencing higher demand

Language: Jupyter Notebook - Size: 156 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0

thestorytellingengineer/Introduction_to_Pyspark

PySpark Implementation and methods

Language: Jupyter Notebook - Size: 8.79 KB - Last synced: about 2 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

Chabane/bigdata-playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Language: TypeScript - Size: 3.08 MB - Last synced: about 1 month ago - Pushed: over 5 years ago - Stars: 204 - Forks: 72

xiaruolei/SparkSQLProject

Language: Scala - Size: 865 KB - Last synced: about 2 months ago - Pushed: almost 6 years ago - Stars: 0 - Forks: 0

nelsonssjunior/Python_Spark

Estudos de Streaming de dados com Python e SPark

Language: Jupyter Notebook - Size: 4.88 KB - Last synced: about 2 months ago - Pushed: almost 3 years ago - Stars: 1 - Forks: 0

streamnative/pulsar-spark

Spark Connector to read and write with Pulsar

Language: Scala - Size: 697 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 109 - Forks: 48

aliabbasi2000/Spark

Solving Big Data Problems using Spark framework in Java. Running the Project on HDFS clusters (BigData@Polito) to get the results.

Language: Java - Size: 143 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 1 - Forks: 0

sakethmukkanti/Movielens-Dataset-Analysis-Azure-Data-Engineering-Project

Created a movie recommendation system on Azure utilizing Spark SQL for analyzing the MovieLens dataset.

Language: Jupyter Notebook - Size: 1.6 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0

TiagoCebola/BigData-GooglePlayStore

This project's was developed to solidify the use of Scala manipulating files and dataframes to generate metrics.

Language: Scala - Size: 3.97 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0

techmonad/spark-datasets

This example give a quick overview of the Spark DataFrame API.

Language: Scala - Size: 88.9 KB - Last synced: about 2 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 1

rohitkulkarni08/Azure-ETL-AmazonSalesAnalysis

A comprehensive ETL pipeline and sales analysis project leveraging Microsoft Azure and PySpark, designed to optimize e-commerce sales by providing actionable insights through detailed data analysis.

Language: Jupyter Notebook - Size: 8.04 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0

assamese/spark-python

Spark Python examples

Language: Python - Size: 83 KB - Last synced: about 2 months ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 0

polomarcus/Spark-Structured-Streaming-Examples

Spark Structured Streaming / Kafka / Cassandra / Elastic

Language: Scala - Size: 16.5 MB - Last synced: about 1 month ago - Pushed: over 1 year ago - Stars: 180 - Forks: 79

MoustafaAMahmoud/spark-sandbox

Spark Sandbox project

Language: Scala - Size: 8.79 KB - Last synced: 2 months ago - Pushed: about 4 years ago - Stars: 0 - Forks: 0

mrogove/NewHampshireOpioidDeepDive

Using spark and other tools to analyze large, disparate data sources. Term Group Project for COMP119 Tufts F'19

Language: Jupyter Notebook - Size: 17.3 MB - Last synced: 2 months ago - Pushed: over 4 years ago - Stars: 1 - Forks: 0

LucasKleaL/Big-Data

My practical assignments from Big Data's college class.

Language: Java - Size: 2.35 MB - Last synced: 2 months ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

Lakshmiaddepalli/BigDataProject

CSCI-GA.3033-005 - Big Data Application Development

Language: Python - Size: 41.4 MB - Last synced: 2 months ago - Pushed: almost 2 years ago - Stars: 1 - Forks: 0

IcarusSO/Sparksql-UnitTest

Simple utilities for testing Spark SQL queries, functions, and applications

Size: 12.7 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0

MyXOF/SparkNotes

Spark 2.0学习笔记

Size: 1.59 MB - Last synced: 2 months ago - Pushed: over 5 years ago - Stars: 5 - Forks: 1

jkanclerz/data-science-workshop-2022

The repository contains notebook templates for the purposes of the data science course at the Cracow University of Economics.

Language: Jupyter Notebook - Size: 2.13 MB - Last synced: 2 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

wesslen/Code-Tutorials-for-SOPHI

Tutorials and templates for running Spark on UNCC's SOPHI platform

Size: 17.6 KB - Last synced: 2 months ago - Pushed: over 7 years ago - Stars: 1 - Forks: 2

jaceklaskowski/spark-workshop

Apache Spark™ and Scala Workshops

Language: HTML - Size: 57 MB - Last synced: 19 days ago - Pushed: over 1 year ago - Stars: 253 - Forks: 143

abulbasar/SparkJavaExamples

Code of example of working with Apache Spark using Java

Language: Java - Size: 399 KB - Last synced: 17 days ago - Pushed: about 1 year ago - Stars: 4 - Forks: 8

JBris/time-series-airflow-kafka-spark

A simple demonstration of an Airflow-Kafka-Spark (AKS) stack for online time series forecasting.

Language: Python - Size: 699 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 1 - Forks: 0

zy969/film-genre-insights

DataTalksClub Data Engineering Zoomcamp Project

Language: Python - Size: 32.8 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0

Safaa-p/Machine-Failure-Prediction

Predicting Machine failure using Machine learning on a synthetic dataset of an existing milling machine consisting of 10,000 data points

Language: Jupyter Notebook - Size: 4.7 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0

huangyueranbbc/SparkDemo

spark全示例代码(java、scala) Spark most full instance code DEMO (java、scala)

Language: Java - Size: 2.33 MB - Last synced: 2 months ago - Pushed: about 4 years ago - Stars: 79 - Forks: 67

zsvoboda/ngods-stocks

New Generation Opensource Data Stack Demo

Language: Jupyter Notebook - Size: 22.1 MB - Last synced: 2 months ago - Pushed: over 1 year ago - Stars: 365 - Forks: 86

amitnema/spark-coach

This project contains the learning and experiments with the Apache Spark.

Language: Scala - Size: 46.9 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 1 - Forks: 0

AbdelmajidLh/Spark_ML_Weather

Projet d'apprentissage Scala et Spark : Prédire la pluie de demain avec des données historiques

Language: Scala - Size: 13.7 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0

Ashbyt/SCALA-Spark

Ashley Bythell - Spark/Scala code

Language: Scala - Size: 53.7 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1 - Forks: 0

LakshMundhada/Real-Time-Fraudulent-Transaction-Analytics-Pipeline

A Big Data project leveraging AWS services and Apache frameworks to identify and visualize fraudulent credit card transaction patterns, providing actionable insights to mitigate financial fraud.

Language: Python - Size: 33.5 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0

camilesing/Hive-Spark-SQL-Helper-VSCode

Hive & Spark SQL extension for Visual Studio Code

Language: Java - Size: 7.53 MB - Last synced: 23 days ago - Pushed: 4 months ago - Stars: 3 - Forks: 1

bhanu-kanamarlapudi/EarthquakeAnalysis-PySpark

Language: Python - Size: 18.6 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0

Robyn2024/Home_Sales

I'll use your knowledge of SparkSQL to determine key metrics about home sales data. Then I'll use Spark to create temporary views, partition the data, cache and uncache a temporary table, and verify that the table has been uncached.

Language: Jupyter Notebook - Size: 9.77 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0

adnanrahin/NFL-Big-Data-Bowl-2022

The 2022 Big Data Bowl data contains Next Gen Stats player tracking, play, game, player, and PFF scouting data for all 2018-2020 Special Teams play. Here, you'll find a summary of each data set in the 2022 Data Bowl, a list of key variables to join on, and a description of each variable.

Language: Scala - Size: 1.02 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 1 - Forks: 0

apache/kyuubi-docker

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

Language: Dockerfile - Size: 20.5 KB - Last synced: 11 days ago - Pushed: 24 days ago - Stars: 10 - Forks: 6

izhangzhihao/Real-time-Data-Warehouse

Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi

Language: Dockerfile - Size: 106 KB - Last synced: 3 months ago - Pushed: 5 months ago - Stars: 95 - Forks: 40

DalyaLami/Home_Sales

Determine key metrics about home sales data using SparkSQL and then use Spark to create temporary views, partition the data, cache and uncache a temporary table, and verify that the table has been uncached.

Language: Jupyter Notebook - Size: 1.25 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0