An open API service providing repository metadata for many open source software ecosystems.

Topic: "spark-sql"

getredash/redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

Language: Python - Size: 27.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 27,327 - Forks: 4,471

apache/kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

Language: Scala - Size: 60.1 MB - Last synced at: about 18 hours ago - Pushed at: about 19 hours ago - Stars: 2,207 - Forks: 947

dotnet/spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Language: C# - Size: 4.87 MB - Last synced at: 22 days ago - Pushed at: about 1 month ago - Stars: 2,058 - Forks: 327

almond-sh/almond

A Scala kernel for Jupyter

Language: Scala - Size: 12.8 MB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 1,615 - Forks: 251

apache/incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.

Language: Scala - Size: 199 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1,373 - Forks: 552

databricks/LearningSparkV2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

Language: Scala - Size: 75.2 MB - Last synced at: 2 days ago - Pushed at: 5 months ago - Stars: 1,306 - Forks: 775

oeljeklaus-you/UserActionAnalyzePlatform

电商用户行为分析大数据平台

Language: Java - Size: 1.26 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 1,029 - Forks: 386

ploomber/jupysql Fork of catherinedevlin/ipython-sql

Better SQL in Jupyter. 📊

Language: Python - Size: 12.9 MB - Last synced at: 25 days ago - Pushed at: 3 months ago - Stars: 780 - Forks: 78

qubole/sparklens

Qubole Sparklens tool for performance tuning Apache Spark

Language: Scala - Size: 175 KB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 575 - Forks: 142

kevinschaich/pyspark-cheatsheet

🐍 Quick reference guide to common patterns & functions in PySpark.

Size: 49.8 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 519 - Forks: 167

japila-books/spark-sql-internals

The Internals of Spark SQL

Size: 1.46 GB - Last synced at: about 16 hours ago - Pushed at: about 17 hours ago - Stars: 468 - Forks: 132

zsvoboda/ngods-stocks

New Generation Opensource Data Stack Demo

Language: Jupyter Notebook - Size: 22.1 MB - Last synced at: 30 days ago - Pushed at: over 2 years ago - Stars: 432 - Forks: 101

microsoft/data-accelerator

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Language: C# - Size: 401 MB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 302 - Forks: 90

cuebook/cuelake

Use SQL to build ELT pipelines on a data lakehouse.

Language: JavaScript - Size: 28 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 288 - Forks: 28

jaceklaskowski/spark-workshop

Apache Spark™ and Scala Workshops

Language: HTML - Size: 57 MB - Last synced at: 30 days ago - Pushed at: 11 months ago - Stars: 264 - Forks: 148

Qbeast-io/qbeast-spark

Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!

Language: Scala - Size: 37.3 MB - Last synced at: 16 days ago - Pushed at: 5 months ago - Stars: 228 - Forks: 24

Chabane/bigdata-playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Language: TypeScript - Size: 3.08 MB - Last synced at: about 2 months ago - Pushed at: over 6 years ago - Stars: 209 - Forks: 74

microsoft/MCW-Big-data-analytics-and-visualization 📦

MCW Big data analytics and visualization

Language: JavaScript - Size: 148 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 189 - Forks: 186

bluishglc/bdp

A prototype project of big data platform, the source codes of the book Big Data Platform Architecture and Prototype

Language: Java - Size: 403 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 184 - Forks: 135

polomarcus/Spark-Structured-Streaming-Examples

Spark Structured Streaming / Kafka / Cassandra / Elastic

Language: Scala - Size: 16.5 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 183 - Forks: 78

mc2-project/opaque-sql

An encrypted data analytics platform

Language: Scala - Size: 18 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 182 - Forks: 73

LearningJournal/Spark-Streaming-In-Python

Apache Spark 3 - Structured Streaming Course Material

Language: Python - Size: 19.4 MB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 121 - Forks: 159

xiaogp/recsys_spark

Spark SQL 实现 ItemCF,UserCF,Swing,推荐系统,推荐算法,协同过滤

Language: Scala - Size: 10.7 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 121 - Forks: 47

streamnative/pulsar-spark

Spark Connector to read and write with Pulsar

Language: Scala - Size: 726 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 113 - Forks: 51

izhangzhihao/Real-time-Data-Warehouse

Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi

Language: Dockerfile - Size: 106 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 113 - Forks: 44

sjrusso8/spark-connect-rs

Apache Spark Connect Client for Rust

Language: Rust - Size: 3.88 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 109 - Forks: 18

wangj1106/recommendMoteur

电影推荐系统、电影推荐引擎、使用Spark完成的电影推荐引擎

Language: Scala - Size: 10.4 MB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 106 - Forks: 37

minio/spark-select

A library for Spark DataFrame using MinIO Select API

Language: Scala - Size: 65.4 KB - Last synced at: 4 days ago - Pushed at: over 5 years ago - Stars: 98 - Forks: 19

LearningJournal/SparkProgrammingInScala

Apache Spark Course Material

Language: Scala - Size: 50.9 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 88 - Forks: 159

huangyueranbbc/SparkDemo

spark全示例代码(java、scala) Spark most full instance code DEMO (java、scala)

Language: Java - Size: 2.33 MB - Last synced at: 3 months ago - Pushed at: about 5 years ago - Stars: 83 - Forks: 70

streamnative/awesome-pulsar

A curated list of Pulsar tools, integrations and resources.

Size: 11.7 KB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 81 - Forks: 9

groda/big_data

Big Data essentials: Hadoop, MapReduce, Spark. Explore tutorials and demos in Jupyter notebooks—most are self-contained and live, ready to run with a click.

Language: Jupyter Notebook - Size: 51.9 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 78 - Forks: 27

martandsingh/ApacheSpark

This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.

Language: Python - Size: 141 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 71 - Forks: 47

zsvoboda/ngods

New generation opensource data stack

Language: Dockerfile - Size: 1.62 MB - Last synced at: 2 months ago - Pushed at: about 3 years ago - Stars: 66 - Forks: 9

Thomas-George-T/Movies-Analytics-in-Spark-and-Scala

Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.

Language: Scala - Size: 11.3 MB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 63 - Forks: 46

harryprince/geospark

bring sf to spark in production

Language: R - Size: 15.9 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 57 - Forks: 17

spider-123-eng/Spark

Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .

Language: Scala - Size: 6.59 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 55 - Forks: 42

vim89/datapipelines-essentials-python

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Language: Python - Size: 1.76 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 53 - Forks: 34

sjyttkl/spark_learning

尚硅谷大数据Spark-2019版最新 Spark 学习

Language: Scala - Size: 6.38 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 47 - Forks: 54

hablapps/sparkOptics

Optics for Spark DataFrames

Language: Scala - Size: 58.6 KB - Last synced at: 18 days ago - Pushed at: over 4 years ago - Stars: 47 - Forks: 6

spirom/spark-data-sources

Developing Spark External Data Sources using the V2 API

Language: Java - Size: 114 KB - Last synced at: 3 months ago - Pushed at: about 7 years ago - Stars: 46 - Forks: 18

LearningJournal/Spark-Streaming-In-Scala

Apache Spark 3 - Structured Streaming Course Material

Language: Scala - Size: 19.4 MB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 45 - Forks: 77

airbnb/airbnb-spark-thrift

A library for loadling Thrift data into Spark SQL

Language: Scala - Size: 50.8 KB - Last synced at: about 1 hour ago - Pushed at: over 2 years ago - Stars: 43 - Forks: 16

harryprince/awesome-sparklyr

An awesome sparklyr related package collection

Size: 47.9 KB - Last synced at: 10 days ago - Pushed at: over 5 years ago - Stars: 42 - Forks: 7

mayur2810/sope

Apache Spark ETL Utilities

Language: Scala - Size: 1.08 MB - Last synced at: 11 days ago - Pushed at: 8 months ago - Stars: 40 - Forks: 16

dbiir/paraflow

A real-time analytical system for ID-associated data

Language: Java - Size: 19.1 MB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 38 - Forks: 24

Wh1isper/sparglim

Sparglim✨ makes PySpark App Configurable and Deploy Spark Connect Server Easier!

Language: Python - Size: 151 KB - Last synced at: 1 day ago - Pushed at: 4 months ago - Stars: 37 - Forks: 4

SharpData/SharpETL

Write ETL using your favorite SQL dialects

Language: Scala - Size: 3.37 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 36 - Forks: 5

sunwu51/bigdatatutorial

bigdatatutorial

Language: Shell - Size: 23.3 MB - Last synced at: 6 months ago - Pushed at: almost 7 years ago - Stars: 35 - Forks: 6

salimt/Finance-and-Risk-Management-Algorithms

applications for risk management through computational portfolio construction methods

Language: Jupyter Notebook - Size: 13.4 MB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 32 - Forks: 10

Yifan122/RecommendSystem

电影推荐系统

Language: Scala - Size: 4.31 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 31 - Forks: 11

wushengyeyouya/Hive-JDBC-Proxy

Hive-JDBC-Proxy是一个高性能的HiveServer2和Spark ThriftServer的代理服务,具备负载均衡、基于规则转发Hive JDBC Client的请求给到HiveServer2和Spark ThriftServer的能力。

Language: Scala - Size: 74.2 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 30 - Forks: 15

roshankoirala/pySpark_tutorial

Implementation of Spark code in Jupyter notebook. Topics include: RDDs and DataFrame, exploratory data analysis (EDA), handling multiple DataFrames, visualization, Machine Learning

Language: Jupyter Notebook - Size: 202 KB - Last synced at: 3 months ago - Pushed at: almost 5 years ago - Stars: 29 - Forks: 26

indix/sparkplug

Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌

Language: Scala - Size: 503 KB - Last synced at: 19 days ago - Pushed at: about 5 years ago - Stars: 29 - Forks: 2

Thanaraklee/Real-Time-PySpark

This project introduces PySpark, a powerful open-source framework for distributed data processing. We explore its architecture, components, and applications for real-time data analysis.

Language: Python - Size: 329 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 27 - Forks: 13

Wathon/data_engineering_with_python-track-datacamp

Data Engineer with Python lecture notes from #datacamp.

Language: Jupyter Notebook - Size: 59.7 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 26 - Forks: 22

AndrewKuzmin/spark-structured-streaming-examples

Spark structured streaming examples with using of version 3.4.0

Language: Scala - Size: 1.06 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 25 - Forks: 14

anish749/spark2-etl-examples

A project with examples of using few commonly used data manipulation/processing/transformation APIs in Apache Spark 2.0.0

Language: Scala - Size: 14.5 MB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 25 - Forks: 29

fabiogouw/spark-aws-messaging

A custom sink provider for Apache Spark that sends the content of a dataframe to an AWS SQS

Language: Java - Size: 881 KB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 23 - Forks: 5

haozhang-x/shenzhen-metro-transport-card-data-analysis

深圳通刷卡数据分析

Language: Scala - Size: 27.1 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 22 - Forks: 9

syedhassaanahmed/databricks-notebooks

Collection of Databricks and Jupyter Notebooks

Language: Jupyter Notebook - Size: 742 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 21 - Forks: 15

inferrinizzard/prettier-sql Fork of sql-formatter-org/sql-formatter 📦

[ARCHIVED] Please use https://github.com/sql-formatter-org/sql-formatter

Language: TypeScript - Size: 3.08 MB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 21 - Forks: 5

zrlio/albis

Albis: High-Performance File Format for Big Data Systems

Size: 1.02 MB - Last synced at: almost 2 years ago - Pushed at: almost 7 years ago - Stars: 21 - Forks: 3

astrolabsoftware/spark-fits

FITS data source for Spark SQL and DataFrames

Language: Scala - Size: 8.97 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 20 - Forks: 7

jamesbyars/apache-spark-etl-pipeline-example

Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computing.

Language: Python - Size: 53.9 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 19 - Forks: 27

nsphung/pyspark-template

A Python PySpark Projet with Poetry

Language: Jupyter Notebook - Size: 81.1 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 18 - Forks: 2

sev7e0/wow-spark

:high_brightness: spark自学手册,包含了例如spark core、spark sql、spark streaming、spark-kafka、delta-lake,以及scala基础练习,还有一些例如master、shuffle源码分析,总结及翻译。

Language: Scala - Size: 1.96 MB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 18 - Forks: 7

harshkavdikar1/Tweet-Analysis-With-Kafka-and-Spark

A real time analytics dashboard to analyze the trending hashtags and @ mentions at any location using kafka and spark streaming.

Language: Python - Size: 238 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 17 - Forks: 8

tejasjbansal/HELTHCARE-SYSTEM

Data cleaning, pre-processing, and Analytics on a Health care data using Spark and Python.

Language: Jupyter Notebook - Size: 3 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 17 - Forks: 14

zekeriyyaa/PySpark-Structured-Streaming-ROS-Kafka-ApacheSpark-Cassandra

A structured streaming was applied to the robot data from ROS-Gazebo simulation environment using Apache Spark. Data is collected in Kafka, analyzed by Apache Spark and stored in Cassandra.

Language: Python - Size: 652 KB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 16 - Forks: 6

jgperrin/net.jgp.books.spark.ch11

Spark in Action, 2nd edition - chapter 11 - Working with SQL

Language: Java - Size: 108 KB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 15 - Forks: 11

JunjianS/spark-streaming-kafka-demo

spark streaming从kafka读取消息,offset写入Redis,spark计算单词出现频率,最后写入hive表

Language: Java - Size: 14.6 KB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 15 - Forks: 7

LuckyZXL2016/Spark-Example

Spark1.6和spark2.2的示例,包含kafka,flume,structuredstreaming,jedis,elasticsearch,mysql,dataframe

Language: Scala - Size: 2.06 MB - Last synced at: 3 months ago - Pushed at: over 7 years ago - Stars: 15 - Forks: 6

asuiu/SparkORM

ORM for Apache Spark and DataFrames schema manager

Language: Python - Size: 482 KB - Last synced at: 15 days ago - Pushed at: 12 months ago - Stars: 14 - Forks: 3

lifeomic/spark-vcf

Spark VCF data source implementation for Dataframes

Language: Scala - Size: 314 KB - Last synced at: about 2 months ago - Pushed at: almost 3 years ago - Stars: 14 - Forks: 2

anaregdesign/openaivec

Pandas extension, Tabular calculation with LLM, Spark UDF Builder

Language: Python - Size: 977 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 13 - Forks: 1

apache/kyuubi-docker

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

Language: Dockerfile - Size: 26.4 KB - Last synced at: about 14 hours ago - Pushed at: 11 days ago - Stars: 13 - Forks: 8

mohankrishna02/interview-scenerios-spark-sql

This repository focuses on providing interview scenario questions that I have encountered during interviews. The questions are designed to simulate real-world scenarios and test your problem-solving and technical skills. By exploring these scenarios, you can gain insights into common interview topics and prepare yourself for similar challenges.

Language: Scala - Size: 353 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 13 - Forks: 20

Tirth27/Real-time-analytics-with-spark-streaming

This project aims to build a streaming application to perform real-time analytics of Covid-19 related tweets and deploy an ML model for real-time sentiment predictions.

Language: Jupyter Notebook - Size: 16.5 MB - Last synced at: 3 months ago - Pushed at: almost 4 years ago - Stars: 13 - Forks: 3

lqdev/RestaurantInspectionsSparkMLNET

ETL & Data Enrichment with Spark.NET and ML.NET Automated (Auto) ML

Language: C# - Size: 25.4 KB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 13 - Forks: 6

waltyou/spark-sql-online-editor

spark sql online editor

Language: JavaScript - Size: 320 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 12 - Forks: 2

EnableAsync/cloud-movie-recommend-system

基于 Spark 的微服务推荐系统

Language: Java - Size: 1.31 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 12 - Forks: 1

EnableAsync/MovieRecommendSystem

基于 Spark Streaming 的电影推荐系统

Language: Java - Size: 3.11 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 12 - Forks: 1

qwshen/spark-etl-framework

A generic ETL framework with Spark_SQL for transforming data by constructing pipelines with Yaml/Json/Xml

Language: Scala - Size: 890 KB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 11 - Forks: 9

HuemulSolutions/huemul-bigdatagovernance

Huemul BigDataGovernance, es una framework que trabaja sobre Spark, Hive y HDFS. Permite la implementación de una estrategia corporativa de dato único, basada en buenas prácticas de Gobierno de Datos. Permite implementar tablas con control de Primary Key y Foreing Key al insertar y actualizar datos utilizando la librería, Validación de nulos, largos de textos, máximos/mínimos de números y fechas, valores únicos y valores por default. También permite clasificar los campos en aplicabilidad de derechos ARCO para facilitar la implementación de leyes de protección de datos tipo GDPR, identificar los niveles de seguridad y si se está aplicando algún tipo de encriptación. Adicionalmente permite agregar reglas de validación más complejas sobre la misma tabla.

Language: Scala - Size: 1.27 MB - Last synced at: 27 days ago - Pushed at: about 2 years ago - Stars: 11 - Forks: 7

Dirkster99/PyNotes

My notebook on using Python with Jupyter Notebook, PySpark etc

Language: Jupyter Notebook - Size: 84.6 MB - Last synced at: 29 days ago - Pushed at: almost 4 years ago - Stars: 11 - Forks: 7

SelimHorri/spark-application

Java Application, uses Apache Spark, handles batch as well as streaming processing

Language: Java - Size: 8.79 KB - Last synced at: 2 months ago - Pushed at: over 4 years ago - Stars: 11 - Forks: 0

yennanliu/spark-etl-pipeline

Various data stream/batch process demo with Apache Scala Spark 🚀

Language: Scala - Size: 5.06 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 11 - Forks: 8

sahilbhange/spark-slowly-changing-dimension

Spark implementation of Slowly Changing Dimension type 2

Language: Scala - Size: 351 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 11 - Forks: 13

chaokunyang/bigdata-examples

bigdata examples about spark and flink

Language: Scala - Size: 50.8 KB - Last synced at: about 1 month ago - Pushed at: almost 7 years ago - Stars: 11 - Forks: 5

tlepple/iceberg-intro-workshop

Hands-on workshop with Apache Iceberg

Language: Shell - Size: 2.31 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 0

samerelhousseini/Geospatial-Analysis-With-Spark

This is a data processing pipeline that implements an End-to-End Real-Time Geospatial Analytics and Visualization multi-component full-stack solution, using Apache Spark Structured Streaming, Apache Kafka, MongoDB Change Streams, Node.js, React, Uber's Deck.gl and React-Vis, and using the Massachusetts Bay Transportation Authority's (MBTA) APIs as a data source

Language: Python - Size: 12.4 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 4

dhiraa/spark-tpcds

Apache Spark TPC-DS benchmark setup with EMR launch setup

Language: Smarty - Size: 1.3 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 10 - Forks: 4

sotowang/log-analysis-system

基于Spark的行为日志分析系统

Language: Java - Size: 1.01 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 10 - Forks: 6

komprenilo/liga

Liga: Let Data Dance with ML Models

Language: Python - Size: 17.9 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 5

invent-analytics/metaframe

Spark DataFrame with metadata

Language: Python - Size: 14.6 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 1

aroch/protobuf-dataframe

A package that lets you run PySpark SQL on your Protobuf data

Language: Python - Size: 8.79 KB - Last synced at: 27 days ago - Pushed at: 8 months ago - Stars: 8 - Forks: 3

taboola/ScORe

ScORe - Programmatic Schema On Read for Spark SQL, powered by Taboola

Language: Java - Size: 51.8 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 2

CloudComputingProject-2022/Data_visualization_and_analysis_tool_for_telemetry_data

An naive anomaly detection and data visualization tool for F1 on board telemetry data.

Language: Python - Size: 1.4 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 8 - Forks: 1

jksinghpro/spark-jms

Spark JMS connector for batch and streaming mode

Language: Scala - Size: 16.6 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 8 - Forks: 1