An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: sparksql

commoncrawl/cc-pyspark

Process Common Crawl data with Python and Spark

Language: Python - Size: 157 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 430 - Forks: 89

SathyaV99/hadoop-spark-traffic-predictor-toronto

🚦 Toronto Traffic Prediction with Apache Spark, Hadoop and SparkML. Used Random Forest as the model for prediction

Language: Jupyter Notebook - Size: 31.9 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

locationtech/rasterframes

Geospatial Raster support for Spark DataFrames

Language: Jupyter Notebook - Size: 102 MB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 252 - Forks: 45

zio/zio-quill

Compile-time Language Integrated Queries for Scala

Language: Scala - Size: 12.5 MB - Last synced at: 12 days ago - Pushed at: 16 days ago - Stars: 2,156 - Forks: 352

zio/zio-protoquill

Quill for Scala 3

Language: Scala - Size: 30 MB - Last synced at: 10 days ago - Pushed at: 16 days ago - Stars: 217 - Forks: 52

DarrenDavy12/Databricks-Certification

topic-specific projects and end-to-end project

Size: 210 KB - Last synced at: 3 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

Stratio/sparta

Real Time Analytics and Data Pipelines based on Spark Streaming

Language: Scala - Size: 123 MB - Last synced at: 23 days ago - Pushed at: over 5 years ago - Stars: 526 - Forks: 196

maddieemihle/Home_Sales

A PySpark-powered analysis of real estate trends using home sales data. This project explores average prices by year, room configuration, and property features, while demonstrating SparkSQL, caching, and partitioning techniques in a scalable data pipeline—all within Google Colab

Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: 20 days ago - Pushed at: 26 days ago - Stars: 0 - Forks: 0

CybercentreCanada/jupyterlab-sql-editor

A JupyterLab extension providing, SQL formatter, auto-completion, syntax highlighting, Spark SQL and Trino

Language: Jupyter Notebook - Size: 90.5 MB - Last synced at: 9 days ago - Pushed at: about 2 months ago - Stars: 88 - Forks: 14

udao-moo/udao-spark-optimizer

A Spark Optimizer for Adaptive, Fine-Grained Parameter Tuning

Language: Python - Size: 4.19 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 9 - Forks: 0

Tavo17s/PySpark-Tutorial

hands-on, beginner-friendly walkthrough of PySpark concepts, practical examples, and real life scenarios using the Databricks platform.

Language: Jupyter Notebook - Size: 526 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

microsoft/data-accelerator

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Language: C# - Size: 401 MB - Last synced at: 11 days ago - Pushed at: 2 months ago - Stars: 302 - Forks: 90

zsvoboda/ngods

New generation opensource data stack

Language: Dockerfile - Size: 1.62 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 66 - Forks: 9

teeyog/IQL

An ad hoc query service based on the spark sql engine.(基于spark sql引擎的即席查询服务)

Language: JavaScript - Size: 12.7 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 383 - Forks: 180

harsha2010/magellan

Geo Spatial Data Analytics on Spark

Language: Scala - Size: 13 MB - Last synced at: 27 days ago - Pushed at: almost 4 years ago - Stars: 532 - Forks: 149

Galaxy092/Samsung-Innovation-Campus-Big-Data-Capstone-Project

Samsung Innovation Campus Big Data Capstone Project - Weather Prediction

Language: Jupyter Notebook - Size: 10.5 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

alisonpezzott/fabric-medallion-cotacoes-bcb

Language: Jupyter Notebook - Size: 56.6 KB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 2

RenanBjj/Databricks-SQL-Optical-Campaign

Databricks Optical Campaign for Hoya Products

Language: Jupyter Notebook - Size: 163 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

SonicDMG/SonicDMG.github.io

A fun place for me to blog about distributed databases, aerial arts, and life in general

Language: HTML - Size: 19.1 MB - Last synced at: 3 months ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

stonezhong/DataManager

Better organize data in data lake and build ETL pipeline with Web UI tool.

Language: JavaScript - Size: 2.33 MB - Last synced at: 19 days ago - Pushed at: about 4 years ago - Stars: 9 - Forks: 2

microsoft/A-TALE-OF-THREE-CITIES

Analyzing the safety (311) dataset published by Azure Open Datasets for Chicago, Boston and New York City using SparkR, SParkSQL, Azure Databricks, visualization using ggplot2 and leaflet. Focus is on descriptive analytics, visualization, clustering, time series forecasting and anomaly detection.

Language: R - Size: 21.8 MB - Last synced at: 4 days ago - Pushed at: about 4 years ago - Stars: 86 - Forks: 34

dazheng/SparkETL

Implement a complete data warehouse etl using spark SQL

Language: Java - Size: 132 KB - Last synced at: 27 days ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 7

yjshen/spark-connector-test

A tutorial on how to use pulsar-spark-connector

Language: Scala - Size: 12.7 KB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 11 - Forks: 3

ravishankar324/Washington-state-electric-vehicles-ETL-pipeline

ETL Datapipeline to process Washington's EV data using Apache Spark, Docker, Snowflake, Airflow, AWS services and visualize the transformed parquet data by creating Tableau Dashboards.

Language: Python - Size: 1.85 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Suryan5h/Apache-Spark

RDDs | Spark SQL | Catalyst Optimizer | Spark Streaming | ALS Algorithm

Language: Jupyter Notebook - Size: 972 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Ren294/Covid-Data-Process

This project integrates real-time data processing and analytics using Apache NiFi, Kafka, Spark, Hive, and AWS services for comprehensive COVID-19 data insights.

Language: Shell - Size: 6.22 MB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 6 - Forks: 0

armahdavi/big_data_spark_building_IOT_sensor_ieq_analytics_ML_thermal_comfort_MURB_retrofit_social_housing

This repository summarizes my analytics, big data, and ML code work from a Multi-Unit Residential Building (MURB) retrofit project run back during my Ph.D.,

Language: Stata - Size: 1.34 MB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

largecats/sparksql-formatter

A SparkSQL formatter based on https://github.com/zeroturnaround/sql-formatter, with customizations and extra features.

Language: Python - Size: 346 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 14 - Forks: 3

BrooksIan/DS_GTDB

KMeans Clustering on Global Terrorism Database

Size: 1.14 MB - Last synced at: about 2 months ago - Pushed at: about 7 years ago - Stars: 2 - Forks: 3

spirom/LearningSpark

Scala examples for learning to use Spark

Language: Scala - Size: 482 KB - Last synced at: 3 months ago - Pushed at: over 4 years ago - Stars: 444 - Forks: 290

saurfang/sparksql-protobuf

Read SparkSQL parquet file as RDD[Protobuf]

Language: Scala - Size: 61.5 KB - Last synced at: about 2 months ago - Pushed at: over 6 years ago - Stars: 93 - Forks: 37

burhanahmed1/Big-Data-Analytics

Practice tasks in Python programming language using Hadoop, MRJob, PySpark for Big Data Analytics.

Language: Jupyter Notebook - Size: 40 KB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 2 - Forks: 0

potix2/spark-google-spreadsheets

Google Spreadsheets datasource for SparkSQL and DataFrames

Language: Scala - Size: 72.3 KB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 57 - Forks: 47

Kmohamedalie/ApacheSpark-Data_Analytics

Data Analytics with Apache Spark ⭐

Language: Jupyter Notebook - Size: 487 KB - Last synced at: 3 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

shanukatiyar111/Pyspark-Project-1

DATABRICKS PROJECT- END TO END SALES ANALYSIS

Language: Python - Size: 19.5 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

shanukatiyar111/Pyspark-Project-2

DATABRICKS PROJECT - END TO END GOOGLE PLAYSTORE DATA ANALYSIS

Language: Jupyter Notebook - Size: 4.08 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

BrooksIan/SBIR_TFIDF_KMeans

Document clustering using KMeans on TF/IDF features on Small Business Innovation Research (SBIR) data

Language: Jupyter Notebook - Size: 2.41 MB - Last synced at: 19 days ago - Pushed at: about 4 years ago - Stars: 5 - Forks: 2

yaooqinn/spark-postgres

PostgreSQL and GreenPlum Data Source for Apache Spark

Language: Scala - Size: 78.1 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 35 - Forks: 13

Pirata-Codex/Sentiment-Analysis-SparkML

Using SparkML to build different machine learning models for simulating a small scale of big data management

Language: Jupyter Notebook - Size: 172 KB - Last synced at: 12 months ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 1

yaooqinn/spark-ranger 📦

已经合入(apache/incubator-kyuubi) ACL Management for Apache Spark SQL with Apache Ranger.

Language: Scala - Size: 116 KB - Last synced at: 4 months ago - Pushed at: over 3 years ago - Stars: 54 - Forks: 56

WinThitiwat/Data_Lake_with_Spark

ETL process to S3 Data Lake through EMR, Spark, Hadoop, Schema-on-Read

Language: Jupyter Notebook - Size: 536 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

darule0/sparkdiff

A rudimentary command line utility for contrasting Apache Spark event logs.

Language: Shell - Size: 703 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

sana1410/NYPD-Arrest-Data-Year-to-Date

This repository is used to perform data analysis using Databricks and Tableau on NYC crime datasets

Language: HTML - Size: 1.77 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

morfious902002/impala-spark-jdbc-kerberos 📦

Language: Java - Size: 4.88 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 7 - Forks: 5

austinlmcconnell/home-sales-data-evaluation

Contains an analysis of key home sales metrics using SparkSQL and Python to manage large amounts of data.

Language: Jupyter Notebook - Size: 1.39 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

rodrigoorf/SparkStudies

Repo with some Spark and SparkSQL exercises

Language: Java - Size: 41.1 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

hyunjoonbok/PySpark

PySpark functions and utilities with examples. Assists ETL process of data modeling

Language: Jupyter Notebook - Size: 3.79 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 89 - Forks: 73

Amarilli/Home_Sales

In pursuit of significant metrics for home sales data, Google Colab and SparkSQL were employed to extract essential insights.

Language: Jupyter Notebook - Size: 68.4 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

SteveTuttle/home-sales-sparkSQL-metrics

Use SparkSQL to determine key metrics of the data. Use Spark to create temporary views, partition the data, cache and uncache a temporary table, and verify that the table has been uncached.

Language: Jupyter Notebook - Size: 27.3 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

bulutenesemre/OsmAnalysis

OpenStreetMap Data Analysis with Python programming language.

Language: Jupyter Notebook - Size: 1.03 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 0

MoustafaAMahmoud/spark-sandbox

Spark Sandbox project

Language: Scala - Size: 8.79 KB - Last synced at: about 1 year ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

zydusss/Spark

Data Analytics using Spark

Language: Jupyter Notebook - Size: 33.2 KB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 0

luisadrianml/nyctaxi2017

Implementing Big Data Methods to Analyze 2017 NYC Yellow Taxi

Language: Scala - Size: 3.16 MB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 1

200413-java-spark/Project-2-Group-1

Apache Spark Life Expectancy - Daniel, Sutter, and John

Language: Java - Size: 877 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 1

longtng/SparkSQL

The laboratory from CLOUDS Course at EURECOM

Language: HTML - Size: 710 KB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

NavyaTrilok/Advanced-Big-Data-ML-Project

Weather Data Analysis using Python, Pandas, SparkSQL, AutoRegression Model

Size: 1000 Bytes - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

amitnema/spark-coach

This project contains the learning and experiments with the Apache Spark.

Language: Scala - Size: 46.9 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

giucris/yasp

Yet Another SPark Framework

Language: Scala - Size: 228 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 1

cyofeiyue/SparkSQLProject

SparkSQL Training

Language: Scala - Size: 10.7 KB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0

mehrdadalmasi2020/ApacheSpark_ApacheZeppelin_SQL_Shell

Run your first analysis project on Apache Zeppelin using Scala (Spark), Shell, and SQL

Language: Scala - Size: 1.4 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

jubins/Spark-And-MLlib-Projects

This repository contains Spark, MLlib, PySpark and Dataframes projects

Language: Jupyter Notebook - Size: 101 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 39 - Forks: 97

gcdev373/example-spark-datasourcev2

A very simple Java implementation of the Spark DataSourceV2 API.

Language: Java - Size: 9.77 KB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 0

aravinthsci/Miscellaneous1

Language: Jupyter Notebook - Size: 42.2 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 1

open-datastudio/spark-thriftserver

Spark thriftserver itself deploys Spark cluster on Kubernetes and allows JDBC/ODBC clients to execute SQL

Size: 8.79 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 1

RJBarker/home_sales

Use PySpark and SparkSQL to execute SQL queries through a temporary view of the DataFrame created. Conduct additional queries on cached and partitioned data to determine runtime comparisons.

Language: Jupyter Notebook - Size: 146 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

andreiramani/Machine-Learning-with-Apache-Spark

Coursera IBM Data Engineering (Course 12 from 13)

Language: Jupyter Notebook - Size: 2.55 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

omar-elaqqad/BigData-Spark-tps-

Tps Spark & BigData fondamentals

Language: Java - Size: 113 KB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

RahulGupta16/Pyspark-Theory-and-Code-Basics

Pyspark serves as a Python interface to Apache Spark, enabling the execution of Python and SQL-like instructions for the manipulation and analysis of data within a distributed processing framework.

Language: Jupyter Notebook - Size: 1.07 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

shubhammirajkar/superstore_azure_de_project

Copying data from Amazon S3 bucket to Azure Blob container by using Azure Data Factory pipeline. This Data is mounted to Databricks and further analysis is done using Spark SQL.

Language: Python - Size: 2.61 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

bluishglc/bdp

A prototype project of big data platform, the source codes of the book Big Data Platform Architecture and Prototype

Language: Java - Size: 403 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 184 - Forks: 135

Heisenberghj7/Retail-Store-BigData

📊 📑This project provides a step-by-step big data analytics applied in the retail industry through the use of a variety of big data technologies. such as HDFS, Hive and Spark..

Language: Python - Size: 2.11 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

MouhtaramSoufiane/SparkSQL

Language: Java - Size: 20.5 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

el-moudni-hicham/bigdata-spark-sql

This repository includes a brief but informative and simple explanation of Apache Spark and Spark SQL terms with java implementation.

Language: Java - Size: 39.1 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

carmhuo/carmhuo.github.io

学习笔记、心得体会

Language: HTML - Size: 5.08 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

Heli515/San-Francisco-Crime-Analysis

Language: HTML - Size: 539 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

hbutani/spark-druid-olap 📦

Sparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.

Language: Scala - Size: 127 MB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 286 - Forks: 96

vks2106/spark-custom

Spark examples.

Language: Java - Size: 18.6 KB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

futurikidis21/Spark-text-analysis-predicting-exchange-rate-shifts

Language: Jupyter Notebook - Size: 156 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

nrothchicago/SparkCode

Spark code used for my Master's Thesis. Run on AWS EMR clusters

Language: Scala - Size: 22.5 KB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 1

JunjianS/spark-streaming-kafka-demo

spark streaming从kafka读取消息,offset写入Redis,spark计算单词出现频率,最后写入hive表

Language: Java - Size: 14.6 KB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 15 - Forks: 7

StianPedersen/TDDE31_Big_Data

Advanced Big Data course taught at Linköping University. Topics included paralellisation, machine learning with Big Data and querying on distributed systems.

Language: Python - Size: 2.71 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

liumingmusic/HadoopLearning

全套大数据基础学习教程,包含最基础的centos、maven。大数据主要包含hdfs、mr、yarn、hbase、kafka、scala、sparkcore、sparkstreaming、sparksql。教程包含所有的源代码演示以及在线文档说明。

Language: Scala - Size: 5.95 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 52 - Forks: 24

Mrying0823/Bigdata_group_project

航班数据集分析小组项目

Language: JavaScript - Size: 73.4 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Pratikdomadiya/Databricks_workspace

Data exploration, Preprocessing, Analysis, and visualization using PySpark, SparkSQL, Pandas, and Python.

Language: Jupyter Notebook - Size: 4.22 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ramapilli16/CCA175-PySpark-Practice-with-solutions

CCA175-PySpark-Practice-with-solutions

Size: 20.5 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 2

Charityosai/Analysing-clinical-trial-dataset-to-uncover-insights

This project involved the analysis of two historical datasets (a clinical trial datasets and a dataset listing pharmaceutical violations) to extract valuable insights that can aid medical research and strategic decision making in the medical field. The steps were carried out in 3 phases on databricks using (PySpark(RDD and Dataframe) and Spark SQL)

Size: 32.2 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

e-petrachi/MisinformationAnalysis

Un piccolo progetto SPARK e SPARKSQL per fare delle analisi in un contesto distribuito su dati estrapolati da Twitter e salvati in MongoDB.

Language: Java - Size: 253 KB - Last synced at: almost 2 years ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 0

ritamghoshgds/DnA-F1-POC

The project harnessed an ETL multi-hop architecture, ingesting data from the Ergast API into a storage backed by Azure Data Lake. The process involved weekly ingestion of bronze layer data as cutover and delta files. Raw data, in varied formats, was transformed using Azure Databricks PySpark notebooks into enriched Silver and Gold layers.

Language: Python - Size: 5.67 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

jay-chauhan/Spark-TPCH-Benchmark

Course project for University of Washington CSE 599: Big Data Systems on benchmarking bigdata tools

Language: Jupyter Notebook - Size: 9.77 KB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

Heisenberg0203/Apache_Spark

Apache Spark Projects :-From beginners to advanced level

Language: Java - Size: 64.5 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

spg-47/Big-Data-Analytics-on-Stocks-Data

Enhanced profitability and research of stocks historical data using distributed system analytics.

Language: Jupyter Notebook - Size: 7.18 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

PeterSchuld/UCSanDiego_MicroMasters_DataScience-BigDataAnalyticsUsingSpark

The University of California, San Diego, course DSE230x "Big Data Analytics Using Spark" (Summer 2019): Learn how to analyze large datasets using Jupyter notebooks, MapReduce and Spark as a platform. Part 4 of the »Data Science« MicroMasters® Program on edX. Instructor: Yoav Freund, Professor of CS and Engineering, University of California San Diego.

Size: 6.12 MB - Last synced at: almost 2 years ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 1

ZhuXS/Spring-Shiro-Spark

Spring-Shiro-Spark是Spring-Boot Hibernate Spark Spark-SQL Shiro iView VueJs... ...的集成尝试

Language: Java - Size: 15.8 MB - Last synced at: 29 days ago - Pushed at: over 7 years ago - Stars: 114 - Forks: 35

smartlin5228/CCA175

Language: Java - Size: 107 KB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 7 - Forks: 10

FredrikBakken/propertypartner-transactions

:house_with_garden: Application for calculating tax information from Property Partner investments.

Language: Scala - Size: 410 KB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

SharpRay/spark-druid-connector

A library for querying Druid data sources with Apache Spark

Language: Scala - Size: 248 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 22 - Forks: 14

Akhileshkumarkc/Scala-Programs

Language: Scala - Size: 1.8 MB - Last synced at: almost 2 years ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 0

MhmdSyd/Wuzzuf_Jobs_DataAnalysis

Wuzzuf DataAnalysis by java using (SparkSql-Spring-XChart-Spark-ML)

Language: Java - Size: 333 KB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 1

margaretkhendre/Home-Sales-vs-Big-Data

In this repository, Google Collab is paired with SparkSQL to determine key metrics about home sales data. Spark is also used to create temporary views, partition data, and cache/unchache a temporary table in the process.

Language: Jupyter Notebook - Size: 14.6 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

cami5326/Home_Sales

Big Data Analysis of a fictional home sales dataset using SparkSQL

Language: Jupyter Notebook - Size: 75.2 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0