Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: spark-sql
getredash/redash
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Language: Python - Size: 25 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 25,079 - Forks: 4,262
sjrusso8/spark-connect-rs
Apache Spark Connect Client for Rust
Language: Rust - Size: 4.53 MB - Last synced: 2 days ago - Pushed: 3 days ago - Stars: 30 - Forks: 9
almond-sh/almond
A Scala kernel for Jupyter
Language: Scala - Size: 12.3 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 1,568 - Forks: 240
asuiu/SparkORM
ORM for Apache Spark and DataFrames schema manager
Language: Python - Size: 438 KB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 10 - Forks: 3
ExperienceIsKey/Game-Genre-Trends-Analysis
Leveraged AWS, PySpark, and Power BI to analyze trends in PC video game genres. Optimized ETL processes and utilized datasets and the Steam API to reveal nuanced genre frequencies and distributions. Delivered insights driving decisions in game development, marketing, and platform enhancement.
Language: Python - Size: 10.9 MB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 0 - Forks: 0
japila-books/spark-sql-internals
The Internals of Spark SQL
Size: 1.42 GB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 439 - Forks: 128
masalinas/doc-spark-minikube Fork of testdrivenio/spark-kubernetes
DoC Spark on minikube from Mac with Docker Desktop
Language: Shell - Size: 636 KB - Last synced: 9 days ago - Pushed: 9 days ago - Stars: 0 - Forks: 0
dongma/spark-graphx
spark graphx which is designed for distributed graph calculate, including spark-sql spark-streaming and RDD operations
Language: Scala - Size: 15.4 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 3 - Forks: 4
kaladabrio2020/pyspark-ml-analysis-data
Analises de Dados e machine learning com o Pyspark
Language: Jupyter Notebook - Size: 1.95 KB - Last synced: 10 days ago - Pushed: 10 days ago - Stars: 0 - Forks: 0
Qbeast-io/qbeast-spark
Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
Language: Scala - Size: 36.9 MB - Last synced: 8 days ago - Pushed: 11 days ago - Stars: 197 - Forks: 17
groda/big_data
Tutorials on Big Data essentials: Hadoop, MapReduce, Spark.
Language: Jupyter Notebook - Size: 46.2 MB - Last synced: 10 days ago - Pushed: 11 days ago - Stars: 61 - Forks: 23
microsoft/MCW-Big-data-analytics-and-visualization 📦
MCW Big data analytics and visualization
Language: JavaScript - Size: 148 MB - Last synced: 11 days ago - Pushed: almost 2 years ago - Stars: 189 - Forks: 186
essraahmed/Data-Lake-with-Spark
Data Lake with Spark
Language: Python - Size: 37.1 KB - Last synced: 16 days ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
abulbasar/zeppelin-notebooks
Size: 3.91 KB - Last synced: 17 days ago - Pushed: over 6 years ago - Stars: 1 - Forks: 1
mohankrishna02/interview-scenerios-spark-sql
This repository focuses on providing interview scenario questions that I have encountered during interviews. The questions are designed to simulate real-world scenarios and test your problem-solving and technical skills. By exploring these scenarios, you can gain insights into common interview topics and prepare yourself for similar challenges.
Language: Scala - Size: 249 KB - Last synced: 17 days ago - Pushed: 18 days ago - Stars: 0 - Forks: 2
venkatakamaiah46/SQL
Interesting Queries Written in Structured Query Language
Size: 3.91 KB - Last synced: 18 days ago - Pushed: 18 days ago - Stars: 2 - Forks: 0
pregismond/data-analysis-using-spark
Final Project Submission: Data Analysis using Spark
Language: Jupyter Notebook - Size: 20.5 KB - Last synced: 18 days ago - Pushed: 18 days ago - Stars: 0 - Forks: 0
Ashutosh27ind/pySparkNYCParkingTickets
Attempt to scientifically analyze the phenomenon of increased traffic violation tickets issued by the NYC Police Department.
Language: Jupyter Notebook - Size: 11.7 KB - Last synced: 19 days ago - Pushed: about 4 years ago - Stars: 0 - Forks: 0
SEED-VT/DeSQL
DeSQL is an interactive step-through debugging technique for DISC-backed SQL queries. This approach allows users to inspect constituent parts of a query and their corresponding intermediate data interactively, similar to watchpoints in gdb-like debuggers.
Language: Scala - Size: 515 MB - Last synced: 18 days ago - Pushed: 20 days ago - Stars: 1 - Forks: 0
kevinschaich/pyspark-cheatsheet
🐍 Quick reference guide to common patterns & functions in PySpark.
Size: 49.8 KB - Last synced: 19 days ago - Pushed: about 1 year ago - Stars: 343 - Forks: 115
deepjyotiroy079/big-data-stack
Codes created while learning Big Data Stack.
Language: Jupyter Notebook - Size: 949 KB - Last synced: 21 days ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0
AdamPaternostro/Azure-Spark-Livy
Run a job in Spark 2.x with HDInsight and submit the job through Livy
Language: Scala - Size: 168 KB - Last synced: 23 days ago - Pushed: almost 7 years ago - Stars: 0 - Forks: 1
dotnet/spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Language: C# - Size: 2.99 MB - Last synced: 22 days ago - Pushed: about 1 month ago - Stars: 1,999 - Forks: 308
microsoft/data-accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Language: C# - Size: 378 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 293 - Forks: 87
seyfal/SparkMitMAttackSim
Scalable simulation of MitM attacks using parallel random walks and graph analytics on Spark.
Language: Scala - Size: 76.2 KB - Last synced: 26 days ago - Pushed: 3 months ago - Stars: 0 - Forks: 0
morfious902002/impala-spark-jdbc-kerberos 📦
Language: Java - Size: 4.88 KB - Last synced: 27 days ago - Pushed: almost 2 years ago - Stars: 7 - Forks: 5
iobruno/data-engineering-zoomcamp
Data Engineering examples covering Airflow and Mage for workflows; dbt for BigQuery, Redshift, ClickHouse; Spark and Kafka for Batch/Streaming Processing
Language: Python - Size: 3.91 MB - Last synced: 13 days ago - Pushed: about 1 month ago - Stars: 46 - Forks: 1
thomasDoukas/NTUA_ATDS
Advanced Topics in Database Systems course of ECE National Technical University of Athens.
Language: Python - Size: 2.2 MB - Last synced: 27 days ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0
cavallon/Home_Sales
This SparkSQL project analyzes home sales data, optimizing queries and calculating average prices. Results are saved in a Jupyter Notebook and uploaded to a GitHub repository named "Home_Sales."
Language: Jupyter Notebook - Size: 187 KB - Last synced: 27 days ago - Pushed: 27 days ago - Stars: 0 - Forks: 0
airbnb/airbnb-spark-thrift
A library for loadling Thrift data into Spark SQL
Language: Scala - Size: 50.8 KB - Last synced: 12 days ago - Pushed: about 1 year ago - Stars: 43 - Forks: 16
ramkumarpj/Home_Sales
Home sales data is analyzed using SparkSQL. Spark is also used to create temporary views, partition the data, cache and uncache a temporary table, and verify that the table has been uncached.
Language: Jupyter Notebook - Size: 10.7 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0
jowilf/big-data-showcase
This repository contains a project showcasing the use of Big Data technologies in processing and visualizing real-time data from an eCommerce electronics store using tools such as Apache Kafka, Spark Streaming, Spark SQL, HBase, and Plotly
Language: Java - Size: 2.7 MB - Last synced: about 1 month ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0
kriss024/data-discovery-with-pyspark
Spark for Data Science and ETL process.
Language: Jupyter Notebook - Size: 77.9 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 2 - Forks: 0
508lab/Spark-Java
Spark Java api的学习
Language: Java - Size: 12.7 KB - Last synced: about 1 month ago - Pushed: over 5 years ago - Stars: 0 - Forks: 0
flaviostutz/spark-scala-jupyter
Jupyter notebook server prepared for running Spark with Scala kernels on a remote Spark master
Language: Jupyter Notebook - Size: 1.17 MB - Last synced: 28 days ago - Pushed: about 4 years ago - Stars: 4 - Forks: 1
rodrigoorf/SparkStudies
Repo with some Spark and SparkSQL exercises
Language: Java - Size: 41.1 MB - Last synced: about 1 month ago - Pushed: over 4 years ago - Stars: 0 - Forks: 0
amita-shukla/time-usage
Analysis on how people distribute their time between primary needs, work and leisure activities.
Language: Scala - Size: 22.5 KB - Last synced: about 1 month ago - Pushed: almost 4 years ago - Stars: 1 - Forks: 0
aabdel-kader/Apache-Spark
A repository for my practices and projects using pyspark
Language: Jupyter Notebook - Size: 11.6 MB - Last synced: about 1 month ago - Pushed: over 2 years ago - Stars: 1 - Forks: 0
sakethmukkanti/Machinery-Moniter-Iot-Streaming-With-Azure
An application developed to give real-time insights on machine health using Iot sensors by tracking and monitoring parameters such as temperature, pressure, current and humidity.
Language: Jupyter Notebook - Size: 210 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0
Wh1isper/sparglim
Sparglim✨ makes PySpark App Configurable and Deploy Spark Connect Server Easier!
Language: Python - Size: 144 KB - Last synced: 21 days ago - Pushed: 21 days ago - Stars: 28 - Forks: 2
kayvansol/PySparkJupyterOnKubernetes
PySpark & Jupyter Notebooks Deployed On Kubernetes
Size: 611 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0
johngodoi/ScalaSparkKafka
This code just loads data to kafka through apache spark and reads it back.
Language: Scala - Size: 5.86 KB - Last synced: about 1 month ago - Pushed: about 3 years ago - Stars: 0 - Forks: 0
saurabhg27/dps-project
Spatial Data analysis using Spark SQL
Language: Scala - Size: 4.4 MB - Last synced: about 1 month ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0
chaokunyang/bigdata-examples
bigdata examples about spark and flink
Language: Scala - Size: 50.8 KB - Last synced: about 1 month ago - Pushed: over 5 years ago - Stars: 11 - Forks: 5
apache/kyuubi
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Language: Scala - Size: 56.6 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1,919 - Forks: 852
AlexRogalskiy/spark-patterns
🏆 Spark4You Design patterns
Language: Shell - Size: 15.6 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1 - Forks: 0
apache/incubator-gluten
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
Language: Scala - Size: 175 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 969 - Forks: 347
kevin-lee/fuse Fork of charleso/fuse
Some utilities for interfacing with Spark without blowing a fuse
Language: Scala - Size: 45.9 KB - Last synced: about 1 month ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0
oeljeklaus-you/UserActionAnalyzePlatform
电商用户行为分析大数据平台
Language: Java - Size: 1.26 MB - Last synced: about 1 month ago - Pushed: over 1 year ago - Stars: 913 - Forks: 382
minio/spark-select
A library for Spark DataFrame using MinIO Select API
Language: Scala - Size: 65.4 KB - Last synced: about 1 month ago - Pushed: over 4 years ago - Stars: 96 - Forks: 18
IBM/db2-event-store-akka-streams
Use Akka to implement a WebSockets endpoint and stream data to Db2 Event Store
Language: Jupyter Notebook - Size: 2.39 MB - Last synced: about 1 month ago - Pushed: about 5 years ago - Stars: 9 - Forks: 11
qubole/sparklens
Qubole Sparklens tool for performance tuning Apache Spark
Language: Scala - Size: 175 KB - Last synced: about 1 month ago - Pushed: 11 months ago - Stars: 547 - Forks: 130
ploomber/jupysql Fork of catherinedevlin/ipython-sql
Better SQL in Jupyter. 📊
Language: Python - Size: 12.7 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 588 - Forks: 70
salimt/Finance-and-Risk-Management-Algorithms
applications for risk management through computational portfolio construction methods
Language: Jupyter Notebook - Size: 13.4 MB - Last synced: about 1 month ago - Pushed: over 3 years ago - Stars: 32 - Forks: 10
sarthak25/Smart-City-YVR
Smart City YVR is an innovative project leveraging data-driven methodologies to analyze and address critical aspects of urban living. Focusing on housing affordability, energy consumption, and transportation, this initiative utilizes advanced data analytics to derive actionable insights.
Language: Jupyter Notebook - Size: 109 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0
xiaoa6435/spark-abtest
a spark extensions to help analyze abtest experiments based on raw data
Language: Scala - Size: 58.6 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1 - Forks: 0
aessing/demo-azuresynapse
This repository includes the demos and codes I use to play around with Azure Synapse Anayltics
Size: 80 MB - Last synced: 19 days ago - Pushed: over 1 year ago - Stars: 5 - Forks: 5
MM24J/Home_Sales_Analysis
Using SparkSQL, I analyzed home sales data to identify key metrics.
Language: Jupyter Notebook - Size: 7.81 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0
vim89/datapipelines-essentials-python
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Language: Python - Size: 1.76 MB - Last synced: about 1 month ago - Pushed: about 1 year ago - Stars: 53 - Forks: 34
OKDP/spark-images
Collection of Apache Spark docker images for OKDP
Language: Dockerfile - Size: 78.1 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0
amy-panda/NY_Taxi_Data_Analysis_and_Modelling
Analysing the taxi trips in New York City and predicting total fare amount of taxi trips
Language: Jupyter Notebook - Size: 1.84 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1 - Forks: 0
sakethmukkanti/Demand-Navigator-Real-Time-Streaming-with-Azure
A real-time application to guide cab drivers looking for ride towards the areas of the cities experiencing higher demand
Language: Jupyter Notebook - Size: 156 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0
thestorytellingengineer/Introduction_to_Pyspark
PySpark Implementation and methods
Language: Jupyter Notebook - Size: 8.79 KB - Last synced: about 2 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
Chabane/bigdata-playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Language: TypeScript - Size: 3.08 MB - Last synced: about 1 month ago - Pushed: over 5 years ago - Stars: 204 - Forks: 72
xiaruolei/SparkSQLProject
Language: Scala - Size: 865 KB - Last synced: about 2 months ago - Pushed: almost 6 years ago - Stars: 0 - Forks: 0
nelsonssjunior/Python_Spark
Estudos de Streaming de dados com Python e SPark
Language: Jupyter Notebook - Size: 4.88 KB - Last synced: about 2 months ago - Pushed: almost 3 years ago - Stars: 1 - Forks: 0
streamnative/pulsar-spark
Spark Connector to read and write with Pulsar
Language: Scala - Size: 697 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 109 - Forks: 48
aliabbasi2000/Spark
Solving Big Data Problems using Spark framework in Java. Running the Project on HDFS clusters (BigData@Polito) to get the results.
Language: Java - Size: 143 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 1 - Forks: 0
sakethmukkanti/Movielens-Dataset-Analysis-Azure-Data-Engineering-Project
Created a movie recommendation system on Azure utilizing Spark SQL for analyzing the MovieLens dataset.
Language: Jupyter Notebook - Size: 1.6 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0
TiagoCebola/BigData-GooglePlayStore
This project's was developed to solidify the use of Scala manipulating files and dataframes to generate metrics.
Language: Scala - Size: 3.97 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0
techmonad/spark-datasets
This example give a quick overview of the Spark DataFrame API.
Language: Scala - Size: 88.9 KB - Last synced: about 2 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 1
rohitkulkarni08/Azure-ETL-AmazonSalesAnalysis
A comprehensive ETL pipeline and sales analysis project leveraging Microsoft Azure and PySpark, designed to optimize e-commerce sales by providing actionable insights through detailed data analysis.
Language: Jupyter Notebook - Size: 8.04 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0
assamese/spark-python
Spark Python examples
Language: Python - Size: 83 KB - Last synced: about 2 months ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 0
polomarcus/Spark-Structured-Streaming-Examples
Spark Structured Streaming / Kafka / Cassandra / Elastic
Language: Scala - Size: 16.5 MB - Last synced: about 1 month ago - Pushed: over 1 year ago - Stars: 180 - Forks: 79
MoustafaAMahmoud/spark-sandbox
Spark Sandbox project
Language: Scala - Size: 8.79 KB - Last synced: 2 months ago - Pushed: about 4 years ago - Stars: 0 - Forks: 0
mrogove/NewHampshireOpioidDeepDive
Using spark and other tools to analyze large, disparate data sources. Term Group Project for COMP119 Tufts F'19
Language: Jupyter Notebook - Size: 17.3 MB - Last synced: 2 months ago - Pushed: over 4 years ago - Stars: 1 - Forks: 0
LucasKleaL/Big-Data
My practical assignments from Big Data's college class.
Language: Java - Size: 2.35 MB - Last synced: 2 months ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0
Lakshmiaddepalli/BigDataProject
CSCI-GA.3033-005 - Big Data Application Development
Language: Python - Size: 41.4 MB - Last synced: 2 months ago - Pushed: almost 2 years ago - Stars: 1 - Forks: 0
IcarusSO/Sparksql-UnitTest
Simple utilities for testing Spark SQL queries, functions, and applications
Size: 12.7 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0
MyXOF/SparkNotes
Spark 2.0学习笔记
Size: 1.59 MB - Last synced: 2 months ago - Pushed: over 5 years ago - Stars: 5 - Forks: 1
jkanclerz/data-science-workshop-2022
The repository contains notebook templates for the purposes of the data science course at the Cracow University of Economics.
Language: Jupyter Notebook - Size: 2.13 MB - Last synced: 2 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
wesslen/Code-Tutorials-for-SOPHI
Tutorials and templates for running Spark on UNCC's SOPHI platform
Size: 17.6 KB - Last synced: 2 months ago - Pushed: over 7 years ago - Stars: 1 - Forks: 2
jaceklaskowski/spark-workshop
Apache Spark™ and Scala Workshops
Language: HTML - Size: 57 MB - Last synced: 19 days ago - Pushed: over 1 year ago - Stars: 253 - Forks: 143
abulbasar/SparkJavaExamples
Code of example of working with Apache Spark using Java
Language: Java - Size: 399 KB - Last synced: 17 days ago - Pushed: about 1 year ago - Stars: 4 - Forks: 8
JBris/time-series-airflow-kafka-spark
A simple demonstration of an Airflow-Kafka-Spark (AKS) stack for online time series forecasting.
Language: Python - Size: 699 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 1 - Forks: 0
zy969/film-genre-insights
DataTalksClub Data Engineering Zoomcamp Project
Language: Python - Size: 32.8 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0
Safaa-p/Machine-Failure-Prediction
Predicting Machine failure using Machine learning on a synthetic dataset of an existing milling machine consisting of 10,000 data points
Language: Jupyter Notebook - Size: 4.7 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0
huangyueranbbc/SparkDemo
spark全示例代码(java、scala) Spark most full instance code DEMO (java、scala)
Language: Java - Size: 2.33 MB - Last synced: 2 months ago - Pushed: about 4 years ago - Stars: 79 - Forks: 67
zsvoboda/ngods-stocks
New Generation Opensource Data Stack Demo
Language: Jupyter Notebook - Size: 22.1 MB - Last synced: 2 months ago - Pushed: over 1 year ago - Stars: 365 - Forks: 86
amitnema/spark-coach
This project contains the learning and experiments with the Apache Spark.
Language: Scala - Size: 46.9 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 1 - Forks: 0
AbdelmajidLh/Spark_ML_Weather
Projet d'apprentissage Scala et Spark : Prédire la pluie de demain avec des données historiques
Language: Scala - Size: 13.7 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0
Ashbyt/SCALA-Spark
Ashley Bythell - Spark/Scala code
Language: Scala - Size: 53.7 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1 - Forks: 0
LakshMundhada/Real-Time-Fraudulent-Transaction-Analytics-Pipeline
A Big Data project leveraging AWS services and Apache frameworks to identify and visualize fraudulent credit card transaction patterns, providing actionable insights to mitigate financial fraud.
Language: Python - Size: 33.5 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0
camilesing/Hive-Spark-SQL-Helper-VSCode
Hive & Spark SQL extension for Visual Studio Code
Language: Java - Size: 7.53 MB - Last synced: 23 days ago - Pushed: 4 months ago - Stars: 3 - Forks: 1
bhanu-kanamarlapudi/EarthquakeAnalysis-PySpark
Language: Python - Size: 18.6 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0
Robyn2024/Home_Sales
I'll use your knowledge of SparkSQL to determine key metrics about home sales data. Then I'll use Spark to create temporary views, partition the data, cache and uncache a temporary table, and verify that the table has been uncached.
Language: Jupyter Notebook - Size: 9.77 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0
adnanrahin/NFL-Big-Data-Bowl-2022
The 2022 Big Data Bowl data contains Next Gen Stats player tracking, play, game, player, and PFF scouting data for all 2018-2020 Special Teams play. Here, you'll find a summary of each data set in the 2022 Data Bowl, a list of key variables to join on, and a description of each variable.
Language: Scala - Size: 1.02 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 1 - Forks: 0
apache/kyuubi-docker
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Language: Dockerfile - Size: 20.5 KB - Last synced: 11 days ago - Pushed: 24 days ago - Stars: 10 - Forks: 6
izhangzhihao/Real-time-Data-Warehouse
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
Language: Dockerfile - Size: 106 KB - Last synced: 3 months ago - Pushed: 5 months ago - Stars: 95 - Forks: 40
DalyaLami/Home_Sales
Determine key metrics about home sales data using SparkSQL and then use Spark to create temporary views, partition the data, cache and uncache a temporary table, and verify that the table has been uncached.
Language: Jupyter Notebook - Size: 1.25 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0