Topic: "spark-sql"
getredash/redash
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Language: Python - Size: 27.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 27,327 - Forks: 4,471

apache/kyuubi
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Language: Scala - Size: 60.1 MB - Last synced at: about 18 hours ago - Pushed at: about 19 hours ago - Stars: 2,207 - Forks: 947

dotnet/spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Language: C# - Size: 4.87 MB - Last synced at: 22 days ago - Pushed at: about 1 month ago - Stars: 2,058 - Forks: 327

almond-sh/almond
A Scala kernel for Jupyter
Language: Scala - Size: 12.8 MB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 1,615 - Forks: 251

apache/incubator-gluten
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
Language: Scala - Size: 199 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1,373 - Forks: 552

databricks/LearningSparkV2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Language: Scala - Size: 75.2 MB - Last synced at: 2 days ago - Pushed at: 5 months ago - Stars: 1,306 - Forks: 775

oeljeklaus-you/UserActionAnalyzePlatform
电商用户行为分析大数据平台
Language: Java - Size: 1.26 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 1,029 - Forks: 386

ploomber/jupysql Fork of catherinedevlin/ipython-sql
Better SQL in Jupyter. 📊
Language: Python - Size: 12.9 MB - Last synced at: 25 days ago - Pushed at: 3 months ago - Stars: 780 - Forks: 78

qubole/sparklens
Qubole Sparklens tool for performance tuning Apache Spark
Language: Scala - Size: 175 KB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 575 - Forks: 142

kevinschaich/pyspark-cheatsheet
🐍 Quick reference guide to common patterns & functions in PySpark.
Size: 49.8 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 519 - Forks: 167

japila-books/spark-sql-internals
The Internals of Spark SQL
Size: 1.46 GB - Last synced at: about 16 hours ago - Pushed at: about 17 hours ago - Stars: 468 - Forks: 132

zsvoboda/ngods-stocks
New Generation Opensource Data Stack Demo
Language: Jupyter Notebook - Size: 22.1 MB - Last synced at: 30 days ago - Pushed at: over 2 years ago - Stars: 432 - Forks: 101

microsoft/data-accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Language: C# - Size: 401 MB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 302 - Forks: 90

cuebook/cuelake
Use SQL to build ELT pipelines on a data lakehouse.
Language: JavaScript - Size: 28 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 288 - Forks: 28

jaceklaskowski/spark-workshop
Apache Spark™ and Scala Workshops
Language: HTML - Size: 57 MB - Last synced at: 30 days ago - Pushed at: 11 months ago - Stars: 264 - Forks: 148

Qbeast-io/qbeast-spark
Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
Language: Scala - Size: 37.3 MB - Last synced at: 16 days ago - Pushed at: 5 months ago - Stars: 228 - Forks: 24

Chabane/bigdata-playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Language: TypeScript - Size: 3.08 MB - Last synced at: about 2 months ago - Pushed at: over 6 years ago - Stars: 209 - Forks: 74

microsoft/MCW-Big-data-analytics-and-visualization 📦
MCW Big data analytics and visualization
Language: JavaScript - Size: 148 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 189 - Forks: 186

bluishglc/bdp
A prototype project of big data platform, the source codes of the book Big Data Platform Architecture and Prototype
Language: Java - Size: 403 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 184 - Forks: 135

polomarcus/Spark-Structured-Streaming-Examples
Spark Structured Streaming / Kafka / Cassandra / Elastic
Language: Scala - Size: 16.5 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 183 - Forks: 78

mc2-project/opaque-sql
An encrypted data analytics platform
Language: Scala - Size: 18 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 182 - Forks: 73

LearningJournal/Spark-Streaming-In-Python
Apache Spark 3 - Structured Streaming Course Material
Language: Python - Size: 19.4 MB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 121 - Forks: 159

xiaogp/recsys_spark
Spark SQL 实现 ItemCF,UserCF,Swing,推荐系统,推荐算法,协同过滤
Language: Scala - Size: 10.7 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 121 - Forks: 47

streamnative/pulsar-spark
Spark Connector to read and write with Pulsar
Language: Scala - Size: 726 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 113 - Forks: 51

izhangzhihao/Real-time-Data-Warehouse
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
Language: Dockerfile - Size: 106 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 113 - Forks: 44

sjrusso8/spark-connect-rs
Apache Spark Connect Client for Rust
Language: Rust - Size: 3.88 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 109 - Forks: 18

wangj1106/recommendMoteur
电影推荐系统、电影推荐引擎、使用Spark完成的电影推荐引擎
Language: Scala - Size: 10.4 MB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 106 - Forks: 37

minio/spark-select
A library for Spark DataFrame using MinIO Select API
Language: Scala - Size: 65.4 KB - Last synced at: 4 days ago - Pushed at: over 5 years ago - Stars: 98 - Forks: 19

LearningJournal/SparkProgrammingInScala
Apache Spark Course Material
Language: Scala - Size: 50.9 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 88 - Forks: 159

huangyueranbbc/SparkDemo
spark全示例代码(java、scala) Spark most full instance code DEMO (java、scala)
Language: Java - Size: 2.33 MB - Last synced at: 3 months ago - Pushed at: about 5 years ago - Stars: 83 - Forks: 70

streamnative/awesome-pulsar
A curated list of Pulsar tools, integrations and resources.
Size: 11.7 KB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 81 - Forks: 9

groda/big_data
Big Data essentials: Hadoop, MapReduce, Spark. Explore tutorials and demos in Jupyter notebooks—most are self-contained and live, ready to run with a click.
Language: Jupyter Notebook - Size: 51.9 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 78 - Forks: 27

martandsingh/ApacheSpark
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Language: Python - Size: 141 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 71 - Forks: 47

zsvoboda/ngods
New generation opensource data stack
Language: Dockerfile - Size: 1.62 MB - Last synced at: 2 months ago - Pushed at: about 3 years ago - Stars: 66 - Forks: 9

Thomas-George-T/Movies-Analytics-in-Spark-and-Scala
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
Language: Scala - Size: 11.3 MB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 63 - Forks: 46

harryprince/geospark
bring sf to spark in production
Language: R - Size: 15.9 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 57 - Forks: 17

spider-123-eng/Spark
Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .
Language: Scala - Size: 6.59 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 55 - Forks: 42

vim89/datapipelines-essentials-python
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Language: Python - Size: 1.76 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 53 - Forks: 34

sjyttkl/spark_learning
尚硅谷大数据Spark-2019版最新 Spark 学习
Language: Scala - Size: 6.38 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 47 - Forks: 54

hablapps/sparkOptics
Optics for Spark DataFrames
Language: Scala - Size: 58.6 KB - Last synced at: 18 days ago - Pushed at: over 4 years ago - Stars: 47 - Forks: 6

spirom/spark-data-sources
Developing Spark External Data Sources using the V2 API
Language: Java - Size: 114 KB - Last synced at: 3 months ago - Pushed at: about 7 years ago - Stars: 46 - Forks: 18

LearningJournal/Spark-Streaming-In-Scala
Apache Spark 3 - Structured Streaming Course Material
Language: Scala - Size: 19.4 MB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 45 - Forks: 77

airbnb/airbnb-spark-thrift
A library for loadling Thrift data into Spark SQL
Language: Scala - Size: 50.8 KB - Last synced at: about 1 hour ago - Pushed at: over 2 years ago - Stars: 43 - Forks: 16

harryprince/awesome-sparklyr
An awesome sparklyr related package collection
Size: 47.9 KB - Last synced at: 10 days ago - Pushed at: over 5 years ago - Stars: 42 - Forks: 7

mayur2810/sope
Apache Spark ETL Utilities
Language: Scala - Size: 1.08 MB - Last synced at: 11 days ago - Pushed at: 8 months ago - Stars: 40 - Forks: 16

dbiir/paraflow
A real-time analytical system for ID-associated data
Language: Java - Size: 19.1 MB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 38 - Forks: 24

Wh1isper/sparglim
Sparglim✨ makes PySpark App Configurable and Deploy Spark Connect Server Easier!
Language: Python - Size: 151 KB - Last synced at: 1 day ago - Pushed at: 4 months ago - Stars: 37 - Forks: 4

SharpData/SharpETL
Write ETL using your favorite SQL dialects
Language: Scala - Size: 3.37 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 36 - Forks: 5

sunwu51/bigdatatutorial
bigdatatutorial
Language: Shell - Size: 23.3 MB - Last synced at: 6 months ago - Pushed at: almost 7 years ago - Stars: 35 - Forks: 6

salimt/Finance-and-Risk-Management-Algorithms
applications for risk management through computational portfolio construction methods
Language: Jupyter Notebook - Size: 13.4 MB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 32 - Forks: 10

Yifan122/RecommendSystem
电影推荐系统
Language: Scala - Size: 4.31 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 31 - Forks: 11

wushengyeyouya/Hive-JDBC-Proxy
Hive-JDBC-Proxy是一个高性能的HiveServer2和Spark ThriftServer的代理服务,具备负载均衡、基于规则转发Hive JDBC Client的请求给到HiveServer2和Spark ThriftServer的能力。
Language: Scala - Size: 74.2 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 30 - Forks: 15

roshankoirala/pySpark_tutorial
Implementation of Spark code in Jupyter notebook. Topics include: RDDs and DataFrame, exploratory data analysis (EDA), handling multiple DataFrames, visualization, Machine Learning
Language: Jupyter Notebook - Size: 202 KB - Last synced at: 3 months ago - Pushed at: almost 5 years ago - Stars: 29 - Forks: 26

indix/sparkplug
Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌
Language: Scala - Size: 503 KB - Last synced at: 19 days ago - Pushed at: about 5 years ago - Stars: 29 - Forks: 2

Thanaraklee/Real-Time-PySpark
This project introduces PySpark, a powerful open-source framework for distributed data processing. We explore its architecture, components, and applications for real-time data analysis.
Language: Python - Size: 329 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 27 - Forks: 13

Wathon/data_engineering_with_python-track-datacamp
Data Engineer with Python lecture notes from #datacamp.
Language: Jupyter Notebook - Size: 59.7 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 26 - Forks: 22

AndrewKuzmin/spark-structured-streaming-examples
Spark structured streaming examples with using of version 3.4.0
Language: Scala - Size: 1.06 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 25 - Forks: 14

anish749/spark2-etl-examples
A project with examples of using few commonly used data manipulation/processing/transformation APIs in Apache Spark 2.0.0
Language: Scala - Size: 14.5 MB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 25 - Forks: 29

fabiogouw/spark-aws-messaging
A custom sink provider for Apache Spark that sends the content of a dataframe to an AWS SQS
Language: Java - Size: 881 KB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 23 - Forks: 5

haozhang-x/shenzhen-metro-transport-card-data-analysis
深圳通刷卡数据分析
Language: Scala - Size: 27.1 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 22 - Forks: 9

syedhassaanahmed/databricks-notebooks
Collection of Databricks and Jupyter Notebooks
Language: Jupyter Notebook - Size: 742 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 21 - Forks: 15

inferrinizzard/prettier-sql Fork of sql-formatter-org/sql-formatter 📦
[ARCHIVED] Please use https://github.com/sql-formatter-org/sql-formatter
Language: TypeScript - Size: 3.08 MB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 21 - Forks: 5

zrlio/albis
Albis: High-Performance File Format for Big Data Systems
Size: 1.02 MB - Last synced at: almost 2 years ago - Pushed at: almost 7 years ago - Stars: 21 - Forks: 3

astrolabsoftware/spark-fits
FITS data source for Spark SQL and DataFrames
Language: Scala - Size: 8.97 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 20 - Forks: 7

jamesbyars/apache-spark-etl-pipeline-example
Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computing.
Language: Python - Size: 53.9 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 19 - Forks: 27

nsphung/pyspark-template
A Python PySpark Projet with Poetry
Language: Jupyter Notebook - Size: 81.1 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 18 - Forks: 2

sev7e0/wow-spark
:high_brightness: spark自学手册,包含了例如spark core、spark sql、spark streaming、spark-kafka、delta-lake,以及scala基础练习,还有一些例如master、shuffle源码分析,总结及翻译。
Language: Scala - Size: 1.96 MB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 18 - Forks: 7

harshkavdikar1/Tweet-Analysis-With-Kafka-and-Spark
A real time analytics dashboard to analyze the trending hashtags and @ mentions at any location using kafka and spark streaming.
Language: Python - Size: 238 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 17 - Forks: 8

tejasjbansal/HELTHCARE-SYSTEM
Data cleaning, pre-processing, and Analytics on a Health care data using Spark and Python.
Language: Jupyter Notebook - Size: 3 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 17 - Forks: 14

zekeriyyaa/PySpark-Structured-Streaming-ROS-Kafka-ApacheSpark-Cassandra
A structured streaming was applied to the robot data from ROS-Gazebo simulation environment using Apache Spark. Data is collected in Kafka, analyzed by Apache Spark and stored in Cassandra.
Language: Python - Size: 652 KB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 16 - Forks: 6

jgperrin/net.jgp.books.spark.ch11
Spark in Action, 2nd edition - chapter 11 - Working with SQL
Language: Java - Size: 108 KB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 15 - Forks: 11

JunjianS/spark-streaming-kafka-demo
spark streaming从kafka读取消息,offset写入Redis,spark计算单词出现频率,最后写入hive表
Language: Java - Size: 14.6 KB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 15 - Forks: 7

LuckyZXL2016/Spark-Example
Spark1.6和spark2.2的示例,包含kafka,flume,structuredstreaming,jedis,elasticsearch,mysql,dataframe
Language: Scala - Size: 2.06 MB - Last synced at: 3 months ago - Pushed at: over 7 years ago - Stars: 15 - Forks: 6

asuiu/SparkORM
ORM for Apache Spark and DataFrames schema manager
Language: Python - Size: 482 KB - Last synced at: 15 days ago - Pushed at: 12 months ago - Stars: 14 - Forks: 3

lifeomic/spark-vcf
Spark VCF data source implementation for Dataframes
Language: Scala - Size: 314 KB - Last synced at: about 2 months ago - Pushed at: almost 3 years ago - Stars: 14 - Forks: 2

anaregdesign/openaivec
Pandas extension, Tabular calculation with LLM, Spark UDF Builder
Language: Python - Size: 977 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 13 - Forks: 1

apache/kyuubi-docker
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Language: Dockerfile - Size: 26.4 KB - Last synced at: about 14 hours ago - Pushed at: 11 days ago - Stars: 13 - Forks: 8

mohankrishna02/interview-scenerios-spark-sql
This repository focuses on providing interview scenario questions that I have encountered during interviews. The questions are designed to simulate real-world scenarios and test your problem-solving and technical skills. By exploring these scenarios, you can gain insights into common interview topics and prepare yourself for similar challenges.
Language: Scala - Size: 353 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 13 - Forks: 20

Tirth27/Real-time-analytics-with-spark-streaming
This project aims to build a streaming application to perform real-time analytics of Covid-19 related tweets and deploy an ML model for real-time sentiment predictions.
Language: Jupyter Notebook - Size: 16.5 MB - Last synced at: 3 months ago - Pushed at: almost 4 years ago - Stars: 13 - Forks: 3

lqdev/RestaurantInspectionsSparkMLNET
ETL & Data Enrichment with Spark.NET and ML.NET Automated (Auto) ML
Language: C# - Size: 25.4 KB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 13 - Forks: 6

waltyou/spark-sql-online-editor
spark sql online editor
Language: JavaScript - Size: 320 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 12 - Forks: 2

EnableAsync/cloud-movie-recommend-system
基于 Spark 的微服务推荐系统
Language: Java - Size: 1.31 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 12 - Forks: 1

EnableAsync/MovieRecommendSystem
基于 Spark Streaming 的电影推荐系统
Language: Java - Size: 3.11 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 12 - Forks: 1

qwshen/spark-etl-framework
A generic ETL framework with Spark_SQL for transforming data by constructing pipelines with Yaml/Json/Xml
Language: Scala - Size: 890 KB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 11 - Forks: 9

HuemulSolutions/huemul-bigdatagovernance
Huemul BigDataGovernance, es una framework que trabaja sobre Spark, Hive y HDFS. Permite la implementación de una estrategia corporativa de dato único, basada en buenas prácticas de Gobierno de Datos. Permite implementar tablas con control de Primary Key y Foreing Key al insertar y actualizar datos utilizando la librería, Validación de nulos, largos de textos, máximos/mínimos de números y fechas, valores únicos y valores por default. También permite clasificar los campos en aplicabilidad de derechos ARCO para facilitar la implementación de leyes de protección de datos tipo GDPR, identificar los niveles de seguridad y si se está aplicando algún tipo de encriptación. Adicionalmente permite agregar reglas de validación más complejas sobre la misma tabla.
Language: Scala - Size: 1.27 MB - Last synced at: 27 days ago - Pushed at: about 2 years ago - Stars: 11 - Forks: 7

Dirkster99/PyNotes
My notebook on using Python with Jupyter Notebook, PySpark etc
Language: Jupyter Notebook - Size: 84.6 MB - Last synced at: 29 days ago - Pushed at: almost 4 years ago - Stars: 11 - Forks: 7

SelimHorri/spark-application
Java Application, uses Apache Spark, handles batch as well as streaming processing
Language: Java - Size: 8.79 KB - Last synced at: 2 months ago - Pushed at: over 4 years ago - Stars: 11 - Forks: 0

yennanliu/spark-etl-pipeline
Various data stream/batch process demo with Apache Scala Spark 🚀
Language: Scala - Size: 5.06 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 11 - Forks: 8

sahilbhange/spark-slowly-changing-dimension
Spark implementation of Slowly Changing Dimension type 2
Language: Scala - Size: 351 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 11 - Forks: 13

chaokunyang/bigdata-examples
bigdata examples about spark and flink
Language: Scala - Size: 50.8 KB - Last synced at: about 1 month ago - Pushed at: almost 7 years ago - Stars: 11 - Forks: 5

tlepple/iceberg-intro-workshop
Hands-on workshop with Apache Iceberg
Language: Shell - Size: 2.31 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 0

samerelhousseini/Geospatial-Analysis-With-Spark
This is a data processing pipeline that implements an End-to-End Real-Time Geospatial Analytics and Visualization multi-component full-stack solution, using Apache Spark Structured Streaming, Apache Kafka, MongoDB Change Streams, Node.js, React, Uber's Deck.gl and React-Vis, and using the Massachusetts Bay Transportation Authority's (MBTA) APIs as a data source
Language: Python - Size: 12.4 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 4

dhiraa/spark-tpcds
Apache Spark TPC-DS benchmark setup with EMR launch setup
Language: Smarty - Size: 1.3 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 10 - Forks: 4

sotowang/log-analysis-system
基于Spark的行为日志分析系统
Language: Java - Size: 1.01 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 10 - Forks: 6

komprenilo/liga
Liga: Let Data Dance with ML Models
Language: Python - Size: 17.9 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 5

invent-analytics/metaframe
Spark DataFrame with metadata
Language: Python - Size: 14.6 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 1

aroch/protobuf-dataframe
A package that lets you run PySpark SQL on your Protobuf data
Language: Python - Size: 8.79 KB - Last synced at: 27 days ago - Pushed at: 8 months ago - Stars: 8 - Forks: 3

taboola/ScORe
ScORe - Programmatic Schema On Read for Spark SQL, powered by Taboola
Language: Java - Size: 51.8 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 2

CloudComputingProject-2022/Data_visualization_and_analysis_tool_for_telemetry_data
An naive anomaly detection and data visualization tool for F1 on board telemetry data.
Language: Python - Size: 1.4 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 8 - Forks: 1

jksinghpro/spark-jms
Spark JMS connector for batch and streaming mode
Language: Scala - Size: 16.6 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 8 - Forks: 1
