GitHub topics: spark-sql
aessing/demo-azuresynapse
This repository includes the demos and codes I use to play around with Azure Synapse Anayltics
Size: 80 MB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 5 - Forks: 5

MM24J/Home_Sales_Analysis
Using SparkSQL, I analyzed home sales data to identify key metrics.
Language: Jupyter Notebook - Size: 7.81 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

vim89/datapipelines-essentials-python
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Language: Python - Size: 1.76 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 53 - Forks: 34

amy-panda/NY_Taxi_Data_Analysis_and_Modelling
Analysing the taxi trips in New York City and predicting total fare amount of taxi trips
Language: Jupyter Notebook - Size: 1.84 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

sakethmukkanti/Demand-Navigator-Real-Time-Streaming-with-Azure
A real-time application to guide cab drivers looking for ride towards the areas of the cities experiencing higher demand
Language: Jupyter Notebook - Size: 156 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

xiaruolei/SparkSQLProject
Language: Scala - Size: 865 KB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 0

nelsonssjunior/Python_Spark
Estudos de Streaming de dados com Python e SPark
Language: Jupyter Notebook - Size: 4.88 KB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

aliabbasi2000/Spark
Solving Big Data Problems using Spark framework in Java. Running the Project on HDFS clusters (BigData@Polito) to get the results.
Language: Java - Size: 143 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

sakethmukkanti/Movielens-Dataset-Analysis-Azure-Data-Engineering-Project
Created a movie recommendation system on Azure utilizing Spark SQL for analyzing the MovieLens dataset.
Language: Jupyter Notebook - Size: 1.6 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

TiagoCebola/BigData-GooglePlayStore
This project's was developed to solidify the use of Scala manipulating files and dataframes to generate metrics.
Language: Scala - Size: 3.97 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

techmonad/spark-datasets
This example give a quick overview of the Spark DataFrame API.
Language: Scala - Size: 88.9 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 1

rohitkulkarni08/Azure-ETL-AmazonSalesAnalysis
A comprehensive ETL pipeline and sales analysis project leveraging Microsoft Azure and PySpark, designed to optimize e-commerce sales by providing actionable insights through detailed data analysis.
Language: Jupyter Notebook - Size: 8.04 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

assamese/spark-python
Spark Python examples
Language: Python - Size: 83 KB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

MoustafaAMahmoud/spark-sandbox
Spark Sandbox project
Language: Scala - Size: 8.79 KB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

mrogove/NewHampshireOpioidDeepDive
Using spark and other tools to analyze large, disparate data sources. Term Group Project for COMP119 Tufts F'19
Language: Jupyter Notebook - Size: 17.3 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

Lakshmiaddepalli/BigDataProject
CSCI-GA.3033-005 - Big Data Application Development
Language: Python - Size: 41.4 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

IcarusSO/Sparksql-UnitTest
Simple utilities for testing Spark SQL queries, functions, and applications
Size: 12.7 KB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

jkanclerz/data-science-workshop-2022
The repository contains notebook templates for the purposes of the data science course at the Cracow University of Economics.
Language: Jupyter Notebook - Size: 2.13 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

abulbasar/SparkJavaExamples
Code of example of working with Apache Spark using Java
Language: Java - Size: 399 KB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 4 - Forks: 8

JBris/time-series-airflow-kafka-spark
A simple demonstration of an Airflow-Kafka-Spark (AKS) stack for online time series forecasting.
Language: Python - Size: 699 KB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

zy969/film-genre-insights
DataTalksClub Data Engineering Zoomcamp Project
Language: Python - Size: 32.8 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Safaa-p/Machine-Failure-Prediction
Predicting Machine failure using Machine learning on a synthetic dataset of an existing milling machine consisting of 10,000 data points
Language: Jupyter Notebook - Size: 4.7 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

amitnema/spark-coach
This project contains the learning and experiments with the Apache Spark.
Language: Scala - Size: 46.9 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

AbdelmajidLh/Spark_ML_Weather
Projet d'apprentissage Scala et Spark : Prédire la pluie de demain avec des données historiques
Language: Scala - Size: 13.7 KB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

mliarakos/spark-typed-ops
Lightweight type-safe operations for Spark
Language: Scala - Size: 64.5 KB - Last synced at: 3 months ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 0

LakshMundhada/Real-Time-Fraudulent-Transaction-Analytics-Pipeline
A Big Data project leveraging AWS services and Apache frameworks to identify and visualize fraudulent credit card transaction patterns, providing actionable insights to mitigate financial fraud.
Language: Python - Size: 33.5 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

bhanu-kanamarlapudi/EarthquakeAnalysis-PySpark
Language: Python - Size: 18.6 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Robyn2024/Home_Sales
I'll use your knowledge of SparkSQL to determine key metrics about home sales data. Then I'll use Spark to create temporary views, partition the data, cache and uncache a temporary table, and verify that the table has been uncached.
Language: Jupyter Notebook - Size: 9.77 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

DalyaLami/Home_Sales
Determine key metrics about home sales data using SparkSQL and then use Spark to create temporary views, partition the data, cache and uncache a temporary table, and verify that the table has been uncached.
Language: Jupyter Notebook - Size: 1.25 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Mr-Mens/Analyzing-Wikipedia-Clickstreams-with-PySpark-Project
This project focuses on analyzing Wikipedia's clickstream data to uncover patterns in how users navigate from one article to another. Utilizing Apache Spark and PySpark for data manipulation and analysis, the project aims to provide insights into user behavior on Wikipedia, including the most popular pathways to specific articles.
Language: Jupyter Notebook - Size: 7.81 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

adnanrahin/NFL-Big-Data-Bowl-2022
The 2022 Big Data Bowl data contains Next Gen Stats player tracking, play, game, player, and PFF scouting data for all 2018-2020 Special Teams play. Here, you'll find a summary of each data set in the 2022 Data Bowl, a list of key variables to join on, and a description of each variable.
Language: Scala - Size: 1.02 MB - Last synced at: 18 days ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

bobxwang/predict-stock-in-spark
using spark to predict stock, the data come from sina
Language: Scala - Size: 143 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

bmazzarol/SparkTest.NET 📦
Support for testing :test_tube: Spark dotnet applications
Language: C# - Size: 284 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

CaioBrainer/Hadoop_Projects
Pequenos projetos utilizando ferramentas do ecossistema Apache Hadoop
Language: Python - Size: 18.6 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

lydia-ath/SparkLinux
Assignment for Big Data course of MSc
Language: Python - Size: 20.5 KB - Last synced at: 3 days ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

aravinthsci/Spark-DB-Connector
Sharing Examples for Apache Spark
Language: Scala - Size: 20.5 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 2

aravinthsci/Miscellaneous1
Language: Jupyter Notebook - Size: 42.2 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 1

mohammad-safari/spark-hadoop-exercise
spark hadoop exercise of cloud computing course - aut 1402-1403 fall
Language: Jupyter Notebook - Size: 33.2 MB - Last synced at: 8 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

kosminus/polyflow
Polyflow is an ETL tool based on Apache Spark.
Language: Scala - Size: 41 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

tlepple/iceberg-intro-workshop
Hands-on workshop with Apache Iceberg
Language: Shell - Size: 2.31 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 0

nardyjh/Home_Sales
Spark Home Sales Analysis utilizes Apache Spark to explore and analyze home sales data, providing insights into average prices based on various criteria. The project employs Spark SQL queries for efficient data processing and is designed for easy setup and usage.
Language: Jupyter Notebook - Size: 1.25 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

dhiraa/spark-tpcds
Apache Spark TPC-DS benchmark setup with EMR launch setup
Language: Smarty - Size: 1.3 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 10 - Forks: 4

lifeomic/spark-vcf
Spark VCF data source implementation for Dataframes
Language: Scala - Size: 314 KB - Last synced at: 9 days ago - Pushed at: almost 3 years ago - Stars: 14 - Forks: 2

HuemulSolutions/huemul-bigdatagovernance
Huemul BigDataGovernance, es una framework que trabaja sobre Spark, Hive y HDFS. Permite la implementación de una estrategia corporativa de dato único, basada en buenas prácticas de Gobierno de Datos. Permite implementar tablas con control de Primary Key y Foreing Key al insertar y actualizar datos utilizando la librería, Validación de nulos, largos de textos, máximos/mínimos de números y fechas, valores únicos y valores por default. También permite clasificar los campos en aplicabilidad de derechos ARCO para facilitar la implementación de leyes de protección de datos tipo GDPR, identificar los niveles de seguridad y si se está aplicando algún tipo de encriptación. Adicionalmente permite agregar reglas de validación más complejas sobre la misma tabla.
Language: Scala - Size: 1.27 MB - Last synced at: 12 days ago - Pushed at: about 2 years ago - Stars: 11 - Forks: 7

invent-analytics/metaframe
Spark DataFrame with metadata
Language: Python - Size: 14.6 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 1

annagracia12/MassiveDataProcessing
Projects of the subject Massive Data Processing Engineering at Universidad Internacional de La Rioja.
Language: Jupyter Notebook - Size: 3.73 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

sangwanamit621/sql-solutions-in-pyspark-dataframe-api-and-spark-sql
This repository contains my solutions to various SQL problems from LeetCode, implemented using PySpark DataFrame API and Spark SQL. The goal is to provide alternative solutions and insights for SQL enthusiasts who want to explore the power of PySpark and Spark SQL.
Language: Jupyter Notebook - Size: 71.3 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

SharpData/SharpETL
Write ETL using your favorite SQL dialects
Language: Scala - Size: 3.37 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 36 - Forks: 5

marcocolangelo/Big-Data-processing-and-Analytics
The current repository contains all the code developed during the Big Data processing and Analytics laboratories. Data are processed and analyzed using Hadoop and Spark
Language: Java - Size: 6.1 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

spshah1701/World-Development-Indicators
Analysis of World Development Indicators (WDI) using big data technologies, specifically Databricks, Apache Spark, and Scala.
Size: 107 KB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

arpendu11/graph-based-data-lake
An ETL application which is written in Quarkus, Spark SQL Streaming, Neo4j and various types of Databases and stores. It also covers the devops frameworks like Jenkins CI/CD, docker and Kubernetes.
Language: Java - Size: 56.6 KB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 7 - Forks: 2

Adarsh-Hota/ETL_spark-on-dataproc
A Pyspark project that performs ETL on a Dataproc cluster and writes data to Google Cloud Storage/BigQuery.
Language: Jupyter Notebook - Size: 46.3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

mikerly131/serveUpRecos
Build a mock EMR app and integrate an AI/ML prediction into an encounter workflow
Language: CSS - Size: 54.9 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

aysekonus/movie_recommendation_system
Movie Recommendation System using PySpark, ALS, SQLLite (Movielens Dataset)
Language: Jupyter Notebook - Size: 3.36 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

WazirRohiman/Apache_Spark_Basics
This series explores the basics of Apache Spark with the application of some practical elements of Spark, PySpark & SparkSQL
Language: Jupyter Notebook - Size: 29.3 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

polaroidz/sales_prediction
A Production Machine Learning Pipeline for Predicting Future Sales with Spark
Language: Jupyter Notebook - Size: 90.8 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 3 - Forks: 0

Jeanette22/pipelines_datos_vuelos-ETL
Proceso de ETL: proceso de ingesta, transformación y carga de data al DataWarehouse. Todo esto es una guía personal sobre los pasos que realicé para llevar adelante el proyecto solicitado, igual cualquier sugerencia/error es bien recibida para seguir aprendiendo más y mejorar. Cualquier contirbución es recibida!!
Language: Python - Size: 1.85 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

anshul1004/MutualFriends
Implementation of Hadoop and Spark
Language: Java - Size: 23 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

sunwu51/bigdatatutorial
bigdatatutorial
Language: Shell - Size: 23.3 MB - Last synced at: 7 months ago - Pushed at: almost 7 years ago - Stars: 35 - Forks: 6

miguelangel43/Prediction-Flight-Arrivals-Delays-Spark
Application that trains a classifier and predicts flight arrival delays based on past information. Uses the libraries pyspark.ml and pyspark.sql, performs feature engineering, cross-validation and tests various ML algorithms.
Language: Python - Size: 41 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ThomasByr/lichess-analysis-of-chess-games
♟️ analysis of stockfish anotated lichess games
Language: Jupyter Notebook - Size: 1.88 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

seyfal/SparkMitMAttackSim
Scalable simulation of MitM attacks using parallel random walks and graph analytics on Spark.
Language: Scala - Size: 76.2 KB - Last synced at: 8 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

AjmalSarwary/IoT---assignment-IBM-Data-Science-Specialization
This assignment was part of an IoT motion sensor App running on a watch, predicting actions of the individual wearing the watch based on his arm movements; this IoT Analytics assignments is one of a series of data pipeline coding challenges in the IBM course Scalable Data Science.
Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

bluishglc/bdp
A prototype project of big data platform, the source codes of the book Big Data Platform Architecture and Prototype
Language: Java - Size: 403 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 184 - Forks: 135

microsoft/Functional-Validation-Testing-Spark-SQL
Business Validation Testing in Spark SQL
Language: Scala - Size: 43.9 KB - Last synced at: 5 days ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 4

SamaSamrin/Amazon_Product_Analysis_with_PySpark
We are utilizing Big Data technologies and the platform of PySpark to perform an analysis of the Amazon Products with Python.
Language: Jupyter Notebook - Size: 16.7 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

yennanliu/spark-etl-pipeline
Various data stream/batch process demo with Apache Scala Spark 🚀
Language: Scala - Size: 5.06 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 11 - Forks: 8

sahilbhange/spark-slowly-changing-dimension
Spark implementation of Slowly Changing Dimension type 2
Language: Scala - Size: 351 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 11 - Forks: 13

youheekil/udacity-data-streaming
Projects completed in the Udacity Data Streaming Nanodegree program. Tech used: Apache Kafka, Kafka Connect, KSQL, Faust Stream Processing, Spark Structured Streaming
Language: Python - Size: 1.01 MB - Last synced at: 5 months ago - Pushed at: about 3 years ago - Stars: 4 - Forks: 4

LarryLoveIV/PySpark-NBA
Work in-progress NBA Game Predictor using Spark
Language: Jupyter Notebook - Size: 404 KB - Last synced at: 7 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

javieraespinosa/lifranum
Discovering French Digital Literature (LIFRANUM ANR project)
Language: Jupyter Notebook - Size: 871 KB - Last synced at: 16 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ShubhamJagtap2000/Spark-Python
🐍💥Python and Spark for Big Data
Language: Jupyter Notebook - Size: 73.2 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

arpendu11/stocks-data-pipelining
Data pipeline using Spring Boot to consume Kafka streams data and process it and forward to multiple DB like MySQL and PostgreSQL
Language: Java - Size: 18.6 KB - Last synced at: about 1 year ago - Pushed at: about 5 years ago - Stars: 3 - Forks: 1

chiomauche/Home_Sales
The purpose of the study was to use the knowledge of SparkSQL to determine key metrics about home sales data
Language: Jupyter Notebook - Size: 230 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

xiaogp/recsys_spark
Spark SQL 实现 ItemCF,UserCF,Swing,推荐系统,推荐算法,协同过滤
Language: Scala - Size: 10.7 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 121 - Forks: 47

SamuelBarbosaDev/Justweb_Technical_Test
Esse é um teste técnico para a vaga de Desenvolvedor Python Pleno.
Language: Jupyter Notebook - Size: 3.65 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

tiatsou/SparkDistributedComputing
Distributed Computing with Spark SQL
Size: 193 KB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

charalav7/BigData
Explore the technologies of Hadoop, MapReduce, Spark, Spark SQL, Spark Streaming, Kafka, GraphX, HBase, Cassandra
Language: Jupyter Notebook - Size: 2.7 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

sohilsshah91/Airline-Stock-Prediction-Using-Google-Trends-Oil-Prices
This project highlights a Spark application built on Scala. It utilizes Spark Core, Spark SQL and Spark ML (Machine Learning libraries) for predicting stock prices of specific airline companies. We have used the Google trending words (searched on internet and relevant to financial domain) and also macro-economic oil prices as alternate data to predict stock prices.
Language: Scala - Size: 1.02 MB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 2 - Forks: 1

Mrk-Nguyen/spark-projects
Assignment and personal projects involving Apache Spark using Scala and Python
Language: Scala - Size: 40 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 3

JunjianS/spark-streaming-kafka-demo
spark streaming从kafka读取消息,offset写入Redis,spark计算单词出现频率,最后写入hive表
Language: Java - Size: 14.6 KB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 15 - Forks: 7

Dimitrov-S-Dev/PySpark
PySpark
Language: Jupyter Notebook - Size: 62.5 KB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ketanpurohit0/python
Self directed Python PoC etc/ PostgreSQL / Apache Spark / Pandas
Language: Python - Size: 1.18 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

mdrkb/spark-tutorial
A basic Spark project written in Scala
Language: Scala - Size: 7.81 KB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

piyushgoyal1620/Big-Data-Project-3
This project is basically for collecting enormous data and analyzing it. It includes live streaming of data from FOREX trading API and Electric Vehicle stocks API. The data is fetched and processed using Kafka Streaming and Spark streaming.Throughout this project stocks of Forex data and Electric Vehicle parts making companies data were analyzed…
Language: Jupyter Notebook - Size: 34.7 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

Azimboy/feed-stats
Feed statistics with Spark
Language: Scala - Size: 847 KB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

sanxore/spark-theta-sketch-udfs
This project aims to use Yahoo Theta Sketch api as Spark sql UDFs
Language: Scala - Size: 9.77 KB - Last synced at: almost 2 years ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 0

jaezak/home_sales_sparkSQL
Determine key metrics about home sales data using SparkSQL.
Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

inferrinizzard/prettier-sql Fork of sql-formatter-org/sql-formatter 📦
[ARCHIVED] Please use https://github.com/sql-formatter-org/sql-formatter
Language: TypeScript - Size: 3.08 MB - Last synced at: about 2 months ago - Pushed at: about 3 years ago - Stars: 21 - Forks: 5

wangj1106/recommendMoteur
电影推荐系统、电影推荐引擎、使用Spark完成的电影推荐引擎
Language: Scala - Size: 10.4 MB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 106 - Forks: 37

mathewsrc/machine-failure-prediction
Predicting machine failure
Language: Jupyter Notebook - Size: 6.34 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

zrlio/spark-nullio-fileformat
Spark Null I/O file format
Language: Scala - Size: 98.6 KB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

zrlio/fileformat-benchmarks Fork of animeshtrivedi/fileformat-benchmarks
file format specific benchmarks for Parquet, ORC, Avro, JSON, and Arrow
Language: Scala - Size: 27.3 KB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 4 - Forks: 0

bmazzarol/TypedSpark.NET 📦
Typesafe bindings for :star: Spark.NET
Language: C# - Size: 497 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

Sampsonyu/Data_Lake_with_Spark
Data Lake with Spark
Language: Jupyter Notebook - Size: 6.63 MB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

traunguyentvt/study_big_data_technology
Kafka, Spark Streaming, Spark SQL, Hive, Tableau
Language: Java - Size: 8.03 MB - Last synced at: 4 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

martins-jean/Formula-1-Cloud-Data-Platform-in-Azure
Data engineering project using Data Lake Gen2, Data Factory, Databricks (PySpark, Spark SQL) and Power BI.
Size: 25.4 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

komprenilo/liga
Liga: Let Data Dance with ML Models
Language: Python - Size: 17.9 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 5

norbertolimonjr/KMeans-Clustering-Segmentation-Analysis
Online Retail Cassification for Marketing Segmentation Project using KMeans Clustering, Elbow Method and Silhouette Method for Validation
Language: Jupyter Notebook - Size: 53.4 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Thapep/ApacheSpark
Apache Spark project for Advanced Topics on Databases course
Language: Python - Size: 438 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1
