An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: apache-sparksql

anqorithm/RealTime-StockStream

RealTime StockStream is a streamlined, simulation system for processing live stock market data. It uses Apache Kafka for data input, Apache Spark for data handling, and Apache Cassandra for data storage, making it a powerful yet easy-to-use tool for financial data analysis

Language: Python - Size: 5.36 MB - Last synced at: 3 days ago - Pushed at: 3 months ago - Stars: 26 - Forks: 3

treeverse/lakeFS

lakeFS - Data version control for your data lake | Git for data

Language: Go - Size: 149 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 4,660 - Forks: 373

umbertogriffo/apache-spark-best-practices-and-tuning

https://umbertogriffo.gitbook.io/apache-spark-best-practices-and-tuning/

Size: 1.78 MB - Last synced at: about 22 hours ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 2

JKA098/Pokemon-Feistiness-Apache-Spark-Job

The following readme file, assume that before running the Spark analytic job, you have already installed the correct versions of **Java**, **Hadoop**, **Spark** and that you are inside **Ubuntu**.

Language: Python - Size: 184 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

ajaymahadeven/Apache-Spark-Programs

This repository contains Apache Spark programs implemented in Python. These programs are part of my learning process for Apache Spark and are intended to serve as examples for anyone who is also learning or working with Apache Spark.

Language: Python - Size: 3.94 MB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

tspannhw/table-ddl

DDL for Kudu, Impala, Phoenix, HBase, Hive, MySQL, PostgreSQL, Calcite, ... Tables. SQL.

Language: TSQL - Size: 34.2 KB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 1