GitHub topics: spark-sql
LearningJournal/Spark-Streaming-In-Python
Apache Spark 3 - Structured Streaming Course Material
Language: Python - Size: 19.4 MB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 121 - Forks: 159

s-yazhini/PySpark-and-SparkSQL
In Azure DataBricks
Language: Jupyter Notebook - Size: 13.7 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

airbnb/airbnb-spark-thrift
A library for loadling Thrift data into Spark SQL
Language: Scala - Size: 50.8 KB - Last synced at: 2 days ago - Pushed at: over 2 years ago - Stars: 43 - Forks: 16

SayamAlt/Amazon-Products-API-ETL-and-ML-pipeline
In this project, I've created an end-to-end ETL pipeline and subsequently developed a machine learning model to predict the price of Amazon products based on several product-related features.
Language: Python - Size: 2.95 MB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

mervat-khaled/ETL-Apache-Spark-NYC-Taxi-Data
The goal of this project is to do some ETL (Extract, Transform, and Load) In NYC Taxi Data and its geographical information Using Apache Spark, performing various transformations using Spark's python API "PySpark" and SQL language. And finally saving the processed data into CSVs file partitioned by the number of executors on spark session.
Language: Jupyter Notebook - Size: 7.44 MB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

windi-wulandari/Credit-Scoring-Data-Pipeline
This project implements an end-to-end data pipeline designed to manage and analyze large-scale credit scoring data. Using AWS S3 as a scalable storage solution and Databricks for processing, the pipeline leverages the power of Apache Spark through PySpark and SQL Spark to handle data transformation and analysis efficiently.
Language: Python - Size: 1.21 MB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

RiccardoRevalor/Spark
Spark exercises
Language: Jupyter Notebook - Size: 302 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

Ren294/Covid-Data-Process
This project integrates real-time data processing and analytics using Apache NiFi, Kafka, Spark, Hive, and AWS services for comprehensive COVID-19 data insights.
Language: Shell - Size: 6.22 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 6 - Forks: 0

pathak-ashutosh/spark-movie-recommendation
A movie recommendation system on MovieLens 25M dataset using Python and Apache Spark
Language: Python - Size: 19.5 KB - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

aravind2060/spark-sql-on-flight-data Fork of Cloud-Computing-Fall2024/assignment-4-advanced-spark-sql-on-flight-data
work with a flight dataset and use Spark SQL to analyze flight delays, airport traffic, and other key metrics
Language: Python - Size: 309 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

apache/kyuubi-docker
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Language: Dockerfile - Size: 26.4 KB - Last synced at: 3 days ago - Pushed at: 13 days ago - Stars: 13 - Forks: 8

tomkat-cr/data_lakehouse_local_stack
Data Lakehouse local stack with PySpark, Trino, and Minio. Includes an example to process Raygun error data and the IP address occurrence.
Language: Python - Size: 1.37 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

Cris-Neumann/Big-Data-with-Spark-MLlib-and-Databricks
Predicción de incumplimiento crediticio con algoritmo de Spark MLlib Gradient Boosting Trees, usando cluster de procesamiento de Databricks.
Language: Jupyter Notebook - Size: 580 KB - Last synced at: 26 days ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

lorenzobloise/motion_insights
Application for real-time big data analysis from a Body Sensor Network, developed using Spark in Scala and Kafka
Language: Scala - Size: 47 MB - Last synced at: 4 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

kalebers/Spark_Training
SparkSQL exercises in Java
Language: Java - Size: 42 KB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

SA01/spark-data-stats-tutorial
Contains the code and examples for my article on Medium, which explains how to optimize computing data statistics in Apache Spark jobs using the Observations feature.
Language: Python - Size: 4.88 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

jeffreywijaya100/movies_DMO
data management using verulam blue vm spark sql and hadoop course
Size: 3.33 MB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

BayoAdejare/lightning-streams
Batch/stream ETL pipeline of NOAA GLM dataset, using Python frameworks: Dagster, PySpark and Parquet storage.
Language: Python - Size: 63.4 MB - Last synced at: 7 days ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 0

Ren294/Log-Analysis-Project
This project builds a scalable log analytics pipeline use Lambda architecture for real-time and batch processing of NASA server logs.
Language: Python - Size: 2.88 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 5 - Forks: 1

mayur2810/sope
Apache Spark ETL Utilities
Language: Scala - Size: 1.08 MB - Last synced at: 13 days ago - Pushed at: 8 months ago - Stars: 40 - Forks: 16

DebanjanSarkar/pyspark-maestro
This repo contains implementations of PySpark for real-world use cases for batch data processing, streaming data processing sourced from Kafka, sockets, etc., spark optimizations, business specific bigdata processing scenario solutions, and machine learning use cases.
Language: Jupyter Notebook - Size: 66.1 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 2 - Forks: 1

vishalgattani/quixotic-kafka
Python Stream Processing for Apache Kafka, Spark, Cassandra.
Language: Python - Size: 39.1 KB - Last synced at: 5 days ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

aroch/protobuf-dataframe
A package that lets you run PySpark SQL on your Protobuf data
Language: Python - Size: 8.79 KB - Last synced at: 29 days ago - Pushed at: 8 months ago - Stars: 8 - Forks: 3

HarshOza36/MovieLens_PySpark
MovieLens Dataset analysis using Hadoop and Pyspark
Language: Jupyter Notebook - Size: 6.11 MB - Last synced at: 4 months ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 0

storytellingengineer/Introduction_to_Pyspark
PySpark Implementation and methods
Language: Jupyter Notebook - Size: 10.7 KB - Last synced at: 6 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

nsphung/pyspark-template
A Python PySpark Projet with Poetry
Language: Jupyter Notebook - Size: 81.1 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 18 - Forks: 2

Salma-Mamdoh/Real-Time-E-commerce-Data-Pipeline-with-Spark-ETL
My Second Mini Project At Samsung Innovation Campus
Language: Jupyter Notebook - Size: 20.8 MB - Last synced at: 28 days ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

SakinaJaffri/Home_Sales_Analysis_with_SparkSQL
This project focuses on analyzing home sales data using SparkSQL. It involves creating temporary views, partitioning data, caching tables for optimization, and evaluating query performance using PySpark SQL. The goal is to derive insights into home sales trends based on various metrics and criteria.
Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

zikzakjack/spark-demos
Apache Spark Demos
Language: Jupyter Notebook - Size: 103 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

Shankar-Anumula/data-engineer
Language: Scala - Size: 2.06 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

aryak0512/spark
Apache Spark Capstone project
Language: Java - Size: 15.6 MB - Last synced at: 4 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

burhanahmed1/Big-Data-Analytics
Practice tasks in Python programming language using Hadoop, MRJob, PySpark for Big Data Analytics.
Language: Jupyter Notebook - Size: 40 KB - Last synced at: 4 months ago - Pushed at: 12 months ago - Stars: 2 - Forks: 0

aronmarcus/Pyspark_QuarentenaGlobal_table_Databricks
Engenharia de dados para implementação de tabela de supressão/quarentena de clientes utilizando Pyspark, Spark SQL, Pandas e APIs no Databricks.
Language: Jupyter Notebook - Size: 1.17 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

LeftCoastNerdGirl/Big_Data
This project uses PySpark and SQL to analyze Big Data.
Language: Jupyter Notebook - Size: 44.9 KB - Last synced at: 4 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Non-NeutralZero/spark-feature-engineering-toolkit Fork of AshtonIzmev/spark-feature-engineering-toolkit
Snippets of spark/scala code used to do some handy feature engineering
Language: Scala - Size: 62.5 KB - Last synced at: 5 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

hablapps/sparkOptics
Optics for Spark DataFrames
Language: Scala - Size: 58.6 KB - Last synced at: 20 days ago - Pushed at: over 4 years ago - Stars: 47 - Forks: 6

samwong0127/stock-market
A work sample for the role of a Data Engineer
Language: Jupyter Notebook - Size: 2.74 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

talegari/tidier
dplyr friendly spark style window aggregation for R dataframes and remote dbplyr tbls
Language: R - Size: 438 KB - Last synced at: 21 days ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

Kidaha12/Home_Sales
Language: Jupyter Notebook - Size: 7.81 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

AVI-1213/Home_Sales
This project leverages SparkSQL to analyze home sales data. The goal is to determine key metrics such as average home prices based on various criteria. The tasks include creating temporary views, partitioning data, caching and uncaching tables, and verifying these operations & optimization using Spark.
Language: Jupyter Notebook - Size: 6.84 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

pathak-ashutosh/sentiment-analysis-yelp-reviews
Perform sentiment analysis on Yelp dataset with Apache Spark
Language: Python - Size: 133 KB - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

w7negreiros/Home-Sales---Spark-SQL
Use SparkSQL to determine key metrics about home sales data. Then use Spark to create temporary views, partition the data, cache and uncache a temporary table, and verify that the table has been uncached. Using Google Colab to work on Big Data queries with PySpark SQL, parquet, and cache partitions. - UofT Data Analytics - Bootcamp
Language: Jupyter Notebook - Size: 271 KB - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

Ashbyt/SCALA-Spark
Ashley Bythell - Spark/Scala code
Language: Scala - Size: 38.1 KB - Last synced at: 8 days ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

subhanjandas/Worldwide-Sales-Data-Analysis-and-Exploration-using-Zeppelin-HDFS-and-Spark
This project aimed to analyze and understand worldwide sales data through the use of Zeppelin and HDFS. The primary objective was to utilize Spark's basic Scala commands and SQL to query and manipulate the data, providing valuable insights and findings for the customer.
Language: Python - Size: 1.29 MB - Last synced at: 7 days ago - Pushed at: 11 months ago - Stars: 0 - Forks: 1

DEVANSHUK97/spark-cookbook
Spark, PySpark snippets
Language: Python - Size: 1000 Bytes - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

Rindhujatreesa/Big_Data_Processing_Projects
This repository contains the course work for the Big Data as a part of Master's in Data Science program at UMBC.
Language: Jupyter Notebook - Size: 20.2 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

Wadaboa/production-line-performance
Scala/Spark project, for Languages and Algorithms for Artificial Intelligence class at UNIBO
Language: Scala - Size: 31 MB - Last synced at: 3 months ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

simondelarue/Gdelt-AWS-NoSQL-from-scratch
Cassandra architecture for GDELT Database 🌍
Size: 4.43 MB - Last synced at: 11 months ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

sumukhahe/Click-Event-Analysis
The project is it capture , Monitor and analyze user click events on the e-commerce website, specifically focusing on instances where users explore product pages but do not complete purchases.
Language: JavaScript - Size: 212 KB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

IBM/db2-event-store-akka-streams 📦
Use Akka to implement a WebSockets endpoint and stream data to Db2 Event Store
Language: Jupyter Notebook - Size: 2.39 MB - Last synced at: 17 days ago - Pushed at: about 6 years ago - Stars: 8 - Forks: 11

adnanrahin/Spark-Flights-Data-Analysis
The U.S. Department of Transportation's (DOT) Bureau of Transportation Statistics tracks the on-time performance of domestic flights operated by large air carriers. Summary information on the number of on-time, delayed, canceled, and diverted flights is published in DOT's monthly Air Travel Consumer Report and in this dataset of 2015 flight delays and cancellations.
Language: Scala - Size: 43.9 KB - Last synced at: 4 days ago - Pushed at: 11 months ago - Stars: 1 - Forks: 1

polaternez/Introduction-to-Big-Data
Big Data projects for beginners
Language: Java - Size: 4.63 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

spirom/spark-data-sources
Developing Spark External Data Sources using the V2 API
Language: Java - Size: 114 KB - Last synced at: 3 months ago - Pushed at: about 7 years ago - Stars: 46 - Forks: 18

Brinthat/World-Development-Indicators
Exploring World Development Indicators: Identifying relationship between Health Indicators using Linear Regression & Classification of Income Group based on Health Indicators using Logistic Regression.
Language: HTML - Size: 276 KB - Last synced at: 4 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

michael-pupulin/Scala_Spark_and_SQL
I do some basic statistics and machine learning work on a dataset of tornado events across the United States. The dataset is nowhere near big enough to warrant using Spark over something like R, but I was looking for practice. I do some basic SQL to find out which years and states saw the most tornadoes and the most F5 tornadoes. Then I use Spark's MLlib to do linear regression of time and tornado counts.
Language: Scala - Size: 30.3 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

michael-pupulin/BigTaxi
Using Spark and Scala on a very big dataset for analysis
Language: Scala - Size: 34.2 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

DecioXXIV/BD-StockAnalysis
Repository per il Secondo Progetto del Corso di "Big Data" (2023/24)
Language: Python - Size: 36.1 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

OKDP/spark-images
Collection of Apache Spark docker images for OKDP
Language: Dockerfile - Size: 84 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

flaviostutz/spark-scala-jupyter
Jupyter notebook server prepared for running Spark with Scala kernels on a remote Spark master
Language: Jupyter Notebook - Size: 1.17 MB - Last synced at: 3 months ago - Pushed at: about 5 years ago - Stars: 5 - Forks: 1

masalinas/poc-minio-spark
PoC Minio Spark in Kubernetes
Language: Python - Size: 304 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

oguzaktas/big-data-assignments 📦
Some of my homework assignments for Introduction to Big Data Analysis (BLM442) course at Kocaeli University in Spring 2019
Language: Jupyter Notebook - Size: 13.7 MB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

astrolabsoftware/spark-fits
FITS data source for Spark SQL and DataFrames
Language: Scala - Size: 8.97 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 20 - Forks: 7

SEED-VT/DeSQL
DeSQL is an interactive step-through debugging technique for DISC-backed SQL queries. This approach allows users to inspect constituent parts of a query and their corresponding intermediate data interactively, similar to watchpoints in gdb-like debuggers.
Language: Scala - Size: 515 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

xiaoa6435/spark-abtest
a spark extensions to help analyze abtest experiments based on raw data
Language: Scala - Size: 58.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

muhammad-ahsan/spark-toolbox
Spark based applications to perform big data analytics
Language: Python - Size: 40 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

XUranus/jianshu-bigdata
spark简书用户大数据分析
Language: JavaScript - Size: 2.07 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

mtumilowicz/big-data-scala-spark-batch-workshop
Introduction to Spark Batch processing.
Language: Scala - Size: 385 KB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 1

Coursal/Spark-Examples
Some simple, kinda introductory projects based on Apache Spark to be used as guides in order to make the whole DataFrame data management look less weird or complex.
Language: Scala - Size: 708 KB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

masalinas/doc-spark-minikube Fork of testdrivenio/spark-kubernetes
DoC Spark on minikube from Mac with Docker Desktop
Language: Shell - Size: 636 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

dongma/spark-graphx
spark graphx which is designed for distributed graph calculate, including spark-sql spark-streaming and RDD operations
Language: Scala - Size: 15.4 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 4

kaladabrio2020/pyspark-ml-analysis-data
Analises de Dados e machine learning com o Pyspark
Language: Jupyter Notebook - Size: 1.81 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

microsoft/MCW-Big-data-analytics-and-visualization 📦
MCW Big data analytics and visualization
Language: JavaScript - Size: 148 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 189 - Forks: 186

essraahmed/Data-Lake-with-Spark
Data Lake with Spark
Language: Python - Size: 37.1 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

darule0/sparkdiff
A rudimentary command line utility for contrasting Apache Spark event logs.
Language: Shell - Size: 703 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

abulbasar/zeppelin-notebooks
Size: 3.91 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 1

Ashutosh27ind/pySparkNYCParkingTickets
Attempt to scientifically analyze the phenomenon of increased traffic violation tickets issued by the NYC Police Department.
Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

izhangzhihao/spark-security
Language: Scala - Size: 143 KB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 3

deepjyotiroy079/big-data-stack
Codes created while learning Big Data Stack.
Language: Jupyter Notebook - Size: 949 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

sev7e0/wow-spark
:high_brightness: spark自学手册,包含了例如spark core、spark sql、spark streaming、spark-kafka、delta-lake,以及scala基础练习,还有一些例如master、shuffle源码分析,总结及翻译。
Language: Scala - Size: 1.96 MB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 18 - Forks: 7

morfious902002/impala-spark-jdbc-kerberos 📦
Language: Java - Size: 4.88 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 7 - Forks: 5

thomasDoukas/NTUA_ATDS
Advanced Topics in Database Systems course of ECE National Technical University of Athens.
Language: Python - Size: 2.2 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

cavallon/Home_Sales
This SparkSQL project analyzes home sales data, optimizing queries and calculating average prices. Results are saved in a Jupyter Notebook and uploaded to a GitHub repository named "Home_Sales."
Language: Jupyter Notebook - Size: 187 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

ramkumarpj/Home_Sales
Home sales data is analyzed using SparkSQL. Spark is also used to create temporary views, partition the data, cache and uncache a temporary table, and verify that the table has been uncached.
Language: Jupyter Notebook - Size: 10.7 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

508lab/Spark-Java
Spark Java api的学习
Language: Java - Size: 12.7 KB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

rodrigoorf/SparkStudies
Repo with some Spark and SparkSQL exercises
Language: Java - Size: 41.1 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

amita-shukla/time-usage
Analysis on how people distribute their time between primary needs, work and leisure activities.
Language: Scala - Size: 22.5 KB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

aabdel-kader/Apache-Spark
A repository for my practices and projects using pyspark
Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

sakethmukkanti/Machinery-Moniter-Iot-Streaming-With-Azure
An application developed to give real-time insights on machine health using Iot sensors by tracking and monitoring parameters such as temperature, pressure, current and humidity.
Language: Jupyter Notebook - Size: 210 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

EnableAsync/cloud-movie-recommend-system
基于 Spark 的微服务推荐系统
Language: Java - Size: 1.31 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 12 - Forks: 1

buoyant-data/spark-connect-rust
Spark Connect client library in Rust
Language: Scala - Size: 34.9 MB - Last synced at: 7 days ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

saurabhg27/dps-project
Spatial Data analysis using Spark SQL
Language: Scala - Size: 4.4 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

kevin-lee/fuse Fork of charleso/fuse
Some utilities for interfacing with Spark without blowing a fuse
Language: Scala - Size: 45.9 KB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

salimt/Finance-and-Risk-Management-Algorithms
applications for risk management through computational portfolio construction methods
Language: Jupyter Notebook - Size: 13.4 MB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 32 - Forks: 10

sarthak25/Smart-City-YVR
Smart City YVR is an innovative project leveraging data-driven methodologies to analyze and address critical aspects of urban living. Focusing on housing affordability, energy consumption, and transportation, this initiative utilizes advanced data analytics to derive actionable insights.
Language: Jupyter Notebook - Size: 109 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

aessing/demo-azuresynapse
This repository includes the demos and codes I use to play around with Azure Synapse Anayltics
Size: 80 MB - Last synced at: 21 days ago - Pushed at: almost 3 years ago - Stars: 5 - Forks: 5

MM24J/Home_Sales_Analysis
Using SparkSQL, I analyzed home sales data to identify key metrics.
Language: Jupyter Notebook - Size: 7.81 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

vim89/datapipelines-essentials-python
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Language: Python - Size: 1.76 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 53 - Forks: 34

amy-panda/NY_Taxi_Data_Analysis_and_Modelling
Analysing the taxi trips in New York City and predicting total fare amount of taxi trips
Language: Jupyter Notebook - Size: 1.84 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

sakethmukkanti/Demand-Navigator-Real-Time-Streaming-with-Azure
A real-time application to guide cab drivers looking for ride towards the areas of the cities experiencing higher demand
Language: Jupyter Notebook - Size: 156 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

xiaruolei/SparkSQLProject
Language: Scala - Size: 865 KB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 0
