An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: spark-sql

LearningJournal/Spark-Streaming-In-Python

Apache Spark 3 - Structured Streaming Course Material

Language: Python - Size: 19.4 MB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 121 - Forks: 159

s-yazhini/PySpark-and-SparkSQL

In Azure DataBricks

Language: Jupyter Notebook - Size: 13.7 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

airbnb/airbnb-spark-thrift

A library for loadling Thrift data into Spark SQL

Language: Scala - Size: 50.8 KB - Last synced at: 2 days ago - Pushed at: over 2 years ago - Stars: 43 - Forks: 16

SayamAlt/Amazon-Products-API-ETL-and-ML-pipeline

In this project, I've created an end-to-end ETL pipeline and subsequently developed a machine learning model to predict the price of Amazon products based on several product-related features.

Language: Python - Size: 2.95 MB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

mervat-khaled/ETL-Apache-Spark-NYC-Taxi-Data

The goal of this project is to do some ETL (Extract, Transform, and Load) In NYC Taxi Data and its geographical information Using Apache Spark, performing various transformations using Spark's python API "PySpark" and SQL language. And finally saving the processed data into CSVs file partitioned by the number of executors on spark session.

Language: Jupyter Notebook - Size: 7.44 MB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

windi-wulandari/Credit-Scoring-Data-Pipeline

This project implements an end-to-end data pipeline designed to manage and analyze large-scale credit scoring data. Using AWS S3 as a scalable storage solution and Databricks for processing, the pipeline leverages the power of Apache Spark through PySpark and SQL Spark to handle data transformation and analysis efficiently.

Language: Python - Size: 1.21 MB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

RiccardoRevalor/Spark

Spark exercises

Language: Jupyter Notebook - Size: 302 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

Ren294/Covid-Data-Process

This project integrates real-time data processing and analytics using Apache NiFi, Kafka, Spark, Hive, and AWS services for comprehensive COVID-19 data insights.

Language: Shell - Size: 6.22 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 6 - Forks: 0

pathak-ashutosh/spark-movie-recommendation

A movie recommendation system on MovieLens 25M dataset using Python and Apache Spark

Language: Python - Size: 19.5 KB - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

aravind2060/spark-sql-on-flight-data Fork of Cloud-Computing-Fall2024/assignment-4-advanced-spark-sql-on-flight-data

work with a flight dataset and use Spark SQL to analyze flight delays, airport traffic, and other key metrics

Language: Python - Size: 309 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

apache/kyuubi-docker

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

Language: Dockerfile - Size: 26.4 KB - Last synced at: 3 days ago - Pushed at: 13 days ago - Stars: 13 - Forks: 8

tomkat-cr/data_lakehouse_local_stack

Data Lakehouse local stack with PySpark, Trino, and Minio. Includes an example to process Raygun error data and the IP address occurrence.

Language: Python - Size: 1.37 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

Cris-Neumann/Big-Data-with-Spark-MLlib-and-Databricks

Predicción de incumplimiento crediticio con algoritmo de Spark MLlib Gradient Boosting Trees, usando cluster de procesamiento de Databricks.

Language: Jupyter Notebook - Size: 580 KB - Last synced at: 26 days ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

lorenzobloise/motion_insights

Application for real-time big data analysis from a Body Sensor Network, developed using Spark in Scala and Kafka

Language: Scala - Size: 47 MB - Last synced at: 4 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

kalebers/Spark_Training

SparkSQL exercises in Java

Language: Java - Size: 42 KB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

SA01/spark-data-stats-tutorial

Contains the code and examples for my article on Medium, which explains how to optimize computing data statistics in Apache Spark jobs using the Observations feature.

Language: Python - Size: 4.88 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

jeffreywijaya100/movies_DMO

data management using verulam blue vm spark sql and hadoop course

Size: 3.33 MB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

BayoAdejare/lightning-streams

Batch/stream ETL pipeline of NOAA GLM dataset, using Python frameworks: Dagster, PySpark and Parquet storage.

Language: Python - Size: 63.4 MB - Last synced at: 7 days ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 0

Ren294/Log-Analysis-Project

This project builds a scalable log analytics pipeline use Lambda architecture for real-time and batch processing of NASA server logs.

Language: Python - Size: 2.88 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 5 - Forks: 1

mayur2810/sope

Apache Spark ETL Utilities

Language: Scala - Size: 1.08 MB - Last synced at: 13 days ago - Pushed at: 8 months ago - Stars: 40 - Forks: 16

DebanjanSarkar/pyspark-maestro

This repo contains implementations of PySpark for real-world use cases for batch data processing, streaming data processing sourced from Kafka, sockets, etc., spark optimizations, business specific bigdata processing scenario solutions, and machine learning use cases.

Language: Jupyter Notebook - Size: 66.1 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 2 - Forks: 1

vishalgattani/quixotic-kafka

Python Stream Processing for Apache Kafka, Spark, Cassandra.

Language: Python - Size: 39.1 KB - Last synced at: 5 days ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

aroch/protobuf-dataframe

A package that lets you run PySpark SQL on your Protobuf data

Language: Python - Size: 8.79 KB - Last synced at: 29 days ago - Pushed at: 8 months ago - Stars: 8 - Forks: 3

HarshOza36/MovieLens_PySpark

MovieLens Dataset analysis using Hadoop and Pyspark

Language: Jupyter Notebook - Size: 6.11 MB - Last synced at: 4 months ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 0

storytellingengineer/Introduction_to_Pyspark

PySpark Implementation and methods

Language: Jupyter Notebook - Size: 10.7 KB - Last synced at: 6 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

nsphung/pyspark-template

A Python PySpark Projet with Poetry

Language: Jupyter Notebook - Size: 81.1 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 18 - Forks: 2

Salma-Mamdoh/Real-Time-E-commerce-Data-Pipeline-with-Spark-ETL

My Second Mini Project At Samsung Innovation Campus

Language: Jupyter Notebook - Size: 20.8 MB - Last synced at: 28 days ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

SakinaJaffri/Home_Sales_Analysis_with_SparkSQL

This project focuses on analyzing home sales data using SparkSQL. It involves creating temporary views, partitioning data, caching tables for optimization, and evaluating query performance using PySpark SQL. The goal is to derive insights into home sales trends based on various metrics and criteria.

Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

zikzakjack/spark-demos

Apache Spark Demos

Language: Jupyter Notebook - Size: 103 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

Shankar-Anumula/data-engineer

Language: Scala - Size: 2.06 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

aryak0512/spark

Apache Spark Capstone project

Language: Java - Size: 15.6 MB - Last synced at: 4 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

burhanahmed1/Big-Data-Analytics

Practice tasks in Python programming language using Hadoop, MRJob, PySpark for Big Data Analytics.

Language: Jupyter Notebook - Size: 40 KB - Last synced at: 4 months ago - Pushed at: 12 months ago - Stars: 2 - Forks: 0

aronmarcus/Pyspark_QuarentenaGlobal_table_Databricks

Engenharia de dados para implementação de tabela de supressão/quarentena de clientes utilizando Pyspark, Spark SQL, Pandas e APIs no Databricks.

Language: Jupyter Notebook - Size: 1.17 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

LeftCoastNerdGirl/Big_Data

This project uses PySpark and SQL to analyze Big Data.

Language: Jupyter Notebook - Size: 44.9 KB - Last synced at: 4 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Non-NeutralZero/spark-feature-engineering-toolkit Fork of AshtonIzmev/spark-feature-engineering-toolkit

Snippets of spark/scala code used to do some handy feature engineering

Language: Scala - Size: 62.5 KB - Last synced at: 5 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

hablapps/sparkOptics

Optics for Spark DataFrames

Language: Scala - Size: 58.6 KB - Last synced at: 20 days ago - Pushed at: over 4 years ago - Stars: 47 - Forks: 6

samwong0127/stock-market

A work sample for the role of a Data Engineer

Language: Jupyter Notebook - Size: 2.74 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

talegari/tidier

dplyr friendly spark style window aggregation for R dataframes and remote dbplyr tbls

Language: R - Size: 438 KB - Last synced at: 21 days ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

Kidaha12/Home_Sales

Language: Jupyter Notebook - Size: 7.81 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

AVI-1213/Home_Sales

This project leverages SparkSQL to analyze home sales data. The goal is to determine key metrics such as average home prices based on various criteria. The tasks include creating temporary views, partitioning data, caching and uncaching tables, and verifying these operations & optimization using Spark.

Language: Jupyter Notebook - Size: 6.84 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

pathak-ashutosh/sentiment-analysis-yelp-reviews

Perform sentiment analysis on Yelp dataset with Apache Spark

Language: Python - Size: 133 KB - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

w7negreiros/Home-Sales---Spark-SQL

Use SparkSQL to determine key metrics about home sales data. Then use Spark to create temporary views, partition the data, cache and uncache a temporary table, and verify that the table has been uncached. Using Google Colab to work on Big Data queries with PySpark SQL, parquet, and cache partitions. - UofT Data Analytics - Bootcamp

Language: Jupyter Notebook - Size: 271 KB - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

Ashbyt/SCALA-Spark

Ashley Bythell - Spark/Scala code

Language: Scala - Size: 38.1 KB - Last synced at: 8 days ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

subhanjandas/Worldwide-Sales-Data-Analysis-and-Exploration-using-Zeppelin-HDFS-and-Spark

This project aimed to analyze and understand worldwide sales data through the use of Zeppelin and HDFS. The primary objective was to utilize Spark's basic Scala commands and SQL to query and manipulate the data, providing valuable insights and findings for the customer.

Language: Python - Size: 1.29 MB - Last synced at: 7 days ago - Pushed at: 11 months ago - Stars: 0 - Forks: 1

DEVANSHUK97/spark-cookbook

Spark, PySpark snippets

Language: Python - Size: 1000 Bytes - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

Rindhujatreesa/Big_Data_Processing_Projects

This repository contains the course work for the Big Data as a part of Master's in Data Science program at UMBC.

Language: Jupyter Notebook - Size: 20.2 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

Wadaboa/production-line-performance

Scala/Spark project, for Languages and Algorithms for Artificial Intelligence class at UNIBO

Language: Scala - Size: 31 MB - Last synced at: 3 months ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

simondelarue/Gdelt-AWS-NoSQL-from-scratch

Cassandra architecture for GDELT Database 🌍

Size: 4.43 MB - Last synced at: 11 months ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

sumukhahe/Click-Event-Analysis

The project is it capture , Monitor and analyze user click events on the e-commerce website, specifically focusing on instances where users explore product pages but do not complete purchases.

Language: JavaScript - Size: 212 KB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

IBM/db2-event-store-akka-streams 📦

Use Akka to implement a WebSockets endpoint and stream data to Db2 Event Store

Language: Jupyter Notebook - Size: 2.39 MB - Last synced at: 17 days ago - Pushed at: about 6 years ago - Stars: 8 - Forks: 11

adnanrahin/Spark-Flights-Data-Analysis

The U.S. Department of Transportation's (DOT) Bureau of Transportation Statistics tracks the on-time performance of domestic flights operated by large air carriers. Summary information on the number of on-time, delayed, canceled, and diverted flights is published in DOT's monthly Air Travel Consumer Report and in this dataset of 2015 flight delays and cancellations.

Language: Scala - Size: 43.9 KB - Last synced at: 4 days ago - Pushed at: 11 months ago - Stars: 1 - Forks: 1

polaternez/Introduction-to-Big-Data

Big Data projects for beginners

Language: Java - Size: 4.63 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

spirom/spark-data-sources

Developing Spark External Data Sources using the V2 API

Language: Java - Size: 114 KB - Last synced at: 3 months ago - Pushed at: about 7 years ago - Stars: 46 - Forks: 18

Brinthat/World-Development-Indicators

Exploring World Development Indicators: Identifying relationship between Health Indicators using Linear Regression & Classification of Income Group based on Health Indicators using Logistic Regression.

Language: HTML - Size: 276 KB - Last synced at: 4 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

michael-pupulin/Scala_Spark_and_SQL

I do some basic statistics and machine learning work on a dataset of tornado events across the United States. The dataset is nowhere near big enough to warrant using Spark over something like R, but I was looking for practice. I do some basic SQL to find out which years and states saw the most tornadoes and the most F5 tornadoes. Then I use Spark's MLlib to do linear regression of time and tornado counts.

Language: Scala - Size: 30.3 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

michael-pupulin/BigTaxi

Using Spark and Scala on a very big dataset for analysis

Language: Scala - Size: 34.2 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

DecioXXIV/BD-StockAnalysis

Repository per il Secondo Progetto del Corso di "Big Data" (2023/24)

Language: Python - Size: 36.1 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

OKDP/spark-images

Collection of Apache Spark docker images for OKDP

Language: Dockerfile - Size: 84 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

flaviostutz/spark-scala-jupyter

Jupyter notebook server prepared for running Spark with Scala kernels on a remote Spark master

Language: Jupyter Notebook - Size: 1.17 MB - Last synced at: 3 months ago - Pushed at: about 5 years ago - Stars: 5 - Forks: 1

masalinas/poc-minio-spark

PoC Minio Spark in Kubernetes

Language: Python - Size: 304 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

oguzaktas/big-data-assignments 📦

Some of my homework assignments for Introduction to Big Data Analysis (BLM442) course at Kocaeli University in Spring 2019

Language: Jupyter Notebook - Size: 13.7 MB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

astrolabsoftware/spark-fits

FITS data source for Spark SQL and DataFrames

Language: Scala - Size: 8.97 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 20 - Forks: 7

SEED-VT/DeSQL

DeSQL is an interactive step-through debugging technique for DISC-backed SQL queries. This approach allows users to inspect constituent parts of a query and their corresponding intermediate data interactively, similar to watchpoints in gdb-like debuggers.

Language: Scala - Size: 515 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

xiaoa6435/spark-abtest

a spark extensions to help analyze abtest experiments based on raw data

Language: Scala - Size: 58.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

muhammad-ahsan/spark-toolbox

Spark based applications to perform big data analytics

Language: Python - Size: 40 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

XUranus/jianshu-bigdata

spark简书用户大数据分析

Language: JavaScript - Size: 2.07 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

mtumilowicz/big-data-scala-spark-batch-workshop

Introduction to Spark Batch processing.

Language: Scala - Size: 385 KB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 1

Coursal/Spark-Examples

Some simple, kinda introductory projects based on Apache Spark to be used as guides in order to make the whole DataFrame data management look less weird or complex.

Language: Scala - Size: 708 KB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

masalinas/doc-spark-minikube Fork of testdrivenio/spark-kubernetes

DoC Spark on minikube from Mac with Docker Desktop

Language: Shell - Size: 636 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

dongma/spark-graphx

spark graphx which is designed for distributed graph calculate, including spark-sql spark-streaming and RDD operations

Language: Scala - Size: 15.4 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 4

kaladabrio2020/pyspark-ml-analysis-data

Analises de Dados e machine learning com o Pyspark

Language: Jupyter Notebook - Size: 1.81 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

microsoft/MCW-Big-data-analytics-and-visualization 📦

MCW Big data analytics and visualization

Language: JavaScript - Size: 148 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 189 - Forks: 186

essraahmed/Data-Lake-with-Spark

Data Lake with Spark

Language: Python - Size: 37.1 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

darule0/sparkdiff

A rudimentary command line utility for contrasting Apache Spark event logs.

Language: Shell - Size: 703 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

abulbasar/zeppelin-notebooks

Size: 3.91 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 1

Ashutosh27ind/pySparkNYCParkingTickets

Attempt to scientifically analyze the phenomenon of increased traffic violation tickets issued by the NYC Police Department.

Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

izhangzhihao/spark-security

Language: Scala - Size: 143 KB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 3

deepjyotiroy079/big-data-stack

Codes created while learning Big Data Stack.

Language: Jupyter Notebook - Size: 949 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

sev7e0/wow-spark

:high_brightness: spark自学手册,包含了例如spark core、spark sql、spark streaming、spark-kafka、delta-lake,以及scala基础练习,还有一些例如master、shuffle源码分析,总结及翻译。

Language: Scala - Size: 1.96 MB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 18 - Forks: 7

morfious902002/impala-spark-jdbc-kerberos 📦

Language: Java - Size: 4.88 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 7 - Forks: 5

thomasDoukas/NTUA_ATDS

Advanced Topics in Database Systems course of ECE National Technical University of Athens.

Language: Python - Size: 2.2 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

cavallon/Home_Sales

This SparkSQL project analyzes home sales data, optimizing queries and calculating average prices. Results are saved in a Jupyter Notebook and uploaded to a GitHub repository named "Home_Sales."

Language: Jupyter Notebook - Size: 187 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

ramkumarpj/Home_Sales

Home sales data is analyzed using SparkSQL. Spark is also used to create temporary views, partition the data, cache and uncache a temporary table, and verify that the table has been uncached.

Language: Jupyter Notebook - Size: 10.7 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

508lab/Spark-Java

Spark Java api的学习

Language: Java - Size: 12.7 KB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

rodrigoorf/SparkStudies

Repo with some Spark and SparkSQL exercises

Language: Java - Size: 41.1 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

amita-shukla/time-usage

Analysis on how people distribute their time between primary needs, work and leisure activities.

Language: Scala - Size: 22.5 KB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

aabdel-kader/Apache-Spark

A repository for my practices and projects using pyspark

Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

sakethmukkanti/Machinery-Moniter-Iot-Streaming-With-Azure

An application developed to give real-time insights on machine health using Iot sensors by tracking and monitoring parameters such as temperature, pressure, current and humidity.

Language: Jupyter Notebook - Size: 210 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

EnableAsync/cloud-movie-recommend-system

基于 Spark 的微服务推荐系统

Language: Java - Size: 1.31 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 12 - Forks: 1

buoyant-data/spark-connect-rust

Spark Connect client library in Rust

Language: Scala - Size: 34.9 MB - Last synced at: 7 days ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

saurabhg27/dps-project

Spatial Data analysis using Spark SQL

Language: Scala - Size: 4.4 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

kevin-lee/fuse Fork of charleso/fuse

Some utilities for interfacing with Spark without blowing a fuse

Language: Scala - Size: 45.9 KB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

salimt/Finance-and-Risk-Management-Algorithms

applications for risk management through computational portfolio construction methods

Language: Jupyter Notebook - Size: 13.4 MB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 32 - Forks: 10

sarthak25/Smart-City-YVR

Smart City YVR is an innovative project leveraging data-driven methodologies to analyze and address critical aspects of urban living. Focusing on housing affordability, energy consumption, and transportation, this initiative utilizes advanced data analytics to derive actionable insights.

Language: Jupyter Notebook - Size: 109 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

aessing/demo-azuresynapse

This repository includes the demos and codes I use to play around with Azure Synapse Anayltics

Size: 80 MB - Last synced at: 21 days ago - Pushed at: almost 3 years ago - Stars: 5 - Forks: 5

MM24J/Home_Sales_Analysis

Using SparkSQL, I analyzed home sales data to identify key metrics.

Language: Jupyter Notebook - Size: 7.81 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

vim89/datapipelines-essentials-python

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Language: Python - Size: 1.76 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 53 - Forks: 34

amy-panda/NY_Taxi_Data_Analysis_and_Modelling

Analysing the taxi trips in New York City and predicting total fare amount of taxi trips

Language: Jupyter Notebook - Size: 1.84 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

sakethmukkanti/Demand-Navigator-Real-Time-Streaming-with-Azure

A real-time application to guide cab drivers looking for ride towards the areas of the cities experiencing higher demand

Language: Jupyter Notebook - Size: 156 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

xiaruolei/SparkSQLProject

Language: Scala - Size: 865 KB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 0