An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: spark-sql

norbertolimonjr/KMeans-Clustering-Segmentation-Analysis

Online Retail Cassification for Marketing Segmentation Project using KMeans Clustering, Elbow Method and Silhouette Method for Validation

Language: Jupyter Notebook - Size: 53.4 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Thapep/ApacheSpark

Apache Spark project for Advanced Topics on Databases course

Language: Python - Size: 438 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

brianmslee/Home-Sales-Big-Data-with-PySpark-SQL

Home sales data analysis utilizing PySpark and Spark SQL

Language: Jupyter Notebook - Size: 3.91 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

ramapilli16/CCA175-PySpark-Practice-with-solutions

CCA175-PySpark-Practice-with-solutions

Size: 20.5 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 2

AH-Yussef/Health-Monitor-Big-Data-System

A Health Monitor to simulate receiving and processing large amounts of health metrics from many clients with the goal of efficiently finding aggregate statistics

Language: Java - Size: 319 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

emrekutlug/getting-started-with-pyspark

In this tutorial, I explained SparkContext by using map and filter methods with Lambda functions in Python and created RDD from object and external files, transformations and actions on RDD and pair RDD, PySpark DataFrame from RDD and external files, used sql queries with DataFrames by using Spark SQL, used machine learning with PySpark MLlib.

Language: Jupyter Notebook - Size: 8.58 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 5 - Forks: 6

the-timoye/spark-examples

Language: Python - Size: 14.6 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 2

Melchizedek13/spark-sql-parser

Parsing sql queries to get table names by using the CatalystSqlParser.

Language: Java - Size: 13.7 KB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 2

beyhangl/StructuredSparkStreaming

Spark Streaming with Kafka using Scala

Language: Scala - Size: 182 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

IgorBerman/spark-bucketing

technique to optimise or remove shuffles

Size: 223 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

mihir09/BigData-Analysis

Scrapped and Analyzed Twitter data using Spark. Run Spark queries on Millions of tweets and trained models for sentiment analysis.

Language: JavaScript - Size: 52.5 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Heisenberg0203/Apache_Spark

Apache Spark Projects :-From beginners to advanced level

Language: Java - Size: 64.5 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

akarsh3007/MusicMillionData

HDF5 files in spark, MusicMillionData

Language: Scala - Size: 14.8 MB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

emso-exe/Comercio_eletronico_brasileiro

Projeto de análise de dados do comércio eletrônico brasileiro disponibilizado pela Olist via plataforma Kaggle.

Language: Jupyter Notebook - Size: 41.5 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

viennadatasciencegroup/kf-2017-11-09-R-and-spark

Integrating R into the big data ecosystem using sparklyR

Size: 568 KB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

phphoebe/Udacity-Data-Engineering-with-AWS

Design data models, build data warehouses, data lakes & lakehouse, automate data pipelines - SQL | NoSQL | AWS | Spark | Airflow

Language: Jupyter Notebook - Size: 26.5 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 1

jwsmai/ScalaTools

This project provides Apache Spark SQL, Flink DataStream API examples in Scala language

Language: Scala - Size: 3.19 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

PeterSchuld/Sparkify

Capstone Project in the Udacity Data Scientist Nanodegree program. We manipulate large and realistic datasets with Spark to engineer relevant features for predicting churn. We'll learn how to use Spark MLlib to build machine learning models with large datasets, far beyond what could be done with non-distributed technologies like scikit-learn.

Language: HTML - Size: 2.44 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

zrlio/albis

Albis: High-Performance File Format for Big Data Systems

Size: 1.02 MB - Last synced at: almost 2 years ago - Pushed at: about 7 years ago - Stars: 21 - Forks: 3

karttiik/Big-Data-with-Spark-and-Scala

Big Data with Scala and Spark, building Spark session to capture useful insights.

Language: Scala - Size: 3.91 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

sanjosh/scala

Language: Scala - Size: 75.2 KB - Last synced at: almost 2 years ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 2

gdelgador/SparkScalaExamples

Examples of Scala basic and Spark

Language: Scala - Size: 8.1 MB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

hitesh1292/US-Car-Accidents

Big Data Analytics Project using Apache Spark for Predicting Severity of Car Accidents in the USA

Language: Jupyter Notebook - Size: 1.54 MB - Last synced at: 3 months ago - Pushed at: about 5 years ago - Stars: 2 - Forks: 0

hanydief/Home_Sales

Using knowledge of SparkSQL to determine key metrics about home sales data. Then using Spark to create temporary views, partition the data, cache and uncache a temporary table, and verify that the table has been uncached.

Language: Jupyter Notebook - Size: 9.77 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

xtutran/spark-tutor

Using spark-sql & spark-mllib to tackle Titanic & Movie Recomendation

Language: Scala - Size: 112 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

morbvel/Ghost

Ghost game built in Scala with Spark-SQL

Language: Scala - Size: 3.91 KB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

morbvel/Flight-control

Brief simulation of flights control built in Spark-SQL with Scala

Language: Scala - Size: 1.95 KB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

mehroosali/s3-redshift-batch-etl-pipeline

Built functional python ETL script with functions that initialized spark clusters using pyspark library to extract songs stored in S3 bucket. Partitioned songs data by year and artist_id and compressed in parquet output files to increase load performance. Used the overwrite mode in spark to ensure every new run of ELT script is overwritten in the data lake to avoid duplicates. Orchestrated ELT data pipeline that extracts from S3, loads in redshift for transformation and loads output back to S3. Used hooks in airflow to make connection credentials configurable in order to separate access rights from code base for security. Used operators to execute loading and transformation scripts for redshift with airflow DAG.

Language: Python - Size: 944 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 3

Dragon1573/Online-Works 📦

泰迪在线实习备份

Language: Scala - Size: 2.24 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

temcavanagh/Spark_ETL_Pipeline

Building an ETL pipeline using Apache Spark (spark-sql) for use in predictive modelling.

Language: Python - Size: 0 Bytes - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

gcdev373/example-spark-datasourcev2

A very simple Java implementation of the Spark DataSourceV2 API.

Language: Java - Size: 9.77 KB - Last synced at: 11 days ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 0

Neuw84/spark-drools Fork of Pkrish15/spark-drools

Spark integration with Drools for ETL purposes

Language: Java - Size: 92.8 KB - Last synced at: almost 2 years ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

bruno-uy/IBMCodeMVD2018

Hands-on Data Science en IBM Code Montevideo 2018

Language: Jupyter Notebook - Size: 1.8 MB - Last synced at: almost 2 years ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0

Madhusudhan985/LendingClub-Big-Data-Analytics

A data transformation solution for LendingClub, encompassing data cleansing, feature engineering, and loan scoring to facilitate data-driven decision-making.

Language: Scala - Size: 47.9 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

anjijava16/Spark_MultiFiles_Insert_Oracle

Multi Files Insert into Oracle using Spark Scala

Language: Scala - Size: 26.4 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

anjijava16/kafka_springboot_cassandra_utils

End to End Usecase Swagger UI -->Spring MicroServices -->Kafka -->Spark Consumer -->Cassandra DB

Language: Java - Size: 594 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

dashish333/Spark

Language: Jupyter Notebook - Size: 1.34 MB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

skp33/pointinpolygon

This library takes geoJson (only multipolygon type) and check weather point(lat, lon) is inside polygon or not and return property of matched polygon.

Language: Scala - Size: 3.73 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 1

Chuka-J-Uzo/Data-Streaming-ETL-IUBH

Data-Streaming-ETL-IUBH repository is developed as a real-time streaming application that captures data from a python app that simulates streamed data from the movement of a truck as its source and ingests it into a data store for processing, analysis and visualization.

Language: Python - Size: 34 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

bekirduran/credit-card-tracing

detecting not using virtual credit card, much process which time period and analysing.. on Cloud. Then write MongoDB for more profit :)

Language: Java - Size: 401 KB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

sunnn/TableAligner

Construct Source files as per the target files in Spark using Datframe api and spark

Language: Scala - Size: 63.5 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

NashTech-Labs/Sparkathon

A library having Java and Scala examples for Spark 2.x

Language: Java - Size: 113 MB - Last synced at: 3 months ago - Pushed at: over 8 years ago - Stars: 7 - Forks: 9

javiaspiroz/football-predictions Fork of Juanal07/football-predictions

Football shots visualization and player prices prediction. Built with Spark for distributed computation and Streamlit for the frontend

Size: 29.2 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

javiaspiroz/billetajo Fork of Juanal07/billetajo

KPIs for a bank using the 2015 almeria payments dataset. Built with Spark for distributed computation and Streamlit for the frontend

Size: 10.7 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

jalpan-randeri/spark-notes

Spark Notes

Language: HTML - Size: 181 KB - Last synced at: almost 2 years ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

ziyaocui/732-project

Yelp Business Analysis and Location Recommendation

Language: Python - Size: 60.3 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 2

gavalle94/Songs-Recommender

Recommendation System written in Python, using the pySpark framework and other Data Science libraries

Language: HTML - Size: 5.23 MB - Last synced at: almost 2 years ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 1

gavalle94/Flights-dataset

Analysis of a dataset of flights, using the SparkSQL framework and extra web scraping techniques

Language: HTML - Size: 2.95 MB - Last synced at: almost 2 years ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0

RSummerSchool/R-for-HPC-and-big-data

Slides and lab material for the talk R for HPC and big data at http://rsummer.data-analysis.at

Size: 3.68 MB - Last synced at: almost 2 years ago - Pushed at: almost 8 years ago - Stars: 7 - Forks: 4

dushibaiyu/kotlin-spark-sql-es-read-example

use spark sql read data from Elasticsearch use kotlin

Language: Kotlin - Size: 8.79 KB - Last synced at: almost 2 years ago - Pushed at: almost 8 years ago - Stars: 7 - Forks: 0

dhaval201279/my-oreilly-learning-spark-v2

Important notes and sample code as part of reading 'Learning Spark 2nd Edition'

Language: Scala - Size: 9.95 MB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 1

MortalKommit/DE_04_azure_data_lakehouse

Language: Jupyter Notebook - Size: 550 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

HeliaHashemipour/Hadoop-Spark

Third homework of CloudComputing - Fall 2022

Language: Jupyter Notebook - Size: 50.3 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

arunteja1203/Aruntejakotla-2119770

Big Data Technologies - Assignment 2 Student ID 2119770

Language: Jupyter Notebook - Size: 7.43 MB - Last synced at: 1 day ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Wendy-hub/MusicPrediction

Music prediction using PySpark

Language: HTML - Size: 2.83 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Lucass97/FlightAnalysis

This project implemented a lambda architecture for analyzing domestic flight data in the US from 2009 to 2020. It used Apache Spark for batch processing, Spark Streaming for real-time analysis, and SVM models to predict flight cancellations and delays, with Docker for cluster management and Grafana for real-time visualization.

Language: Jupyter Notebook - Size: 5.66 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 1

globosc/bigdata

Análisis al Proyecto GDELT con herramientas bigdata basadas den hadoop en nube Microsoft Azure

Language: Python - Size: 3.56 MB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

Dheeraj2444/spark

Learning PySpark

Language: Jupyter Notebook - Size: 18.6 KB - Last synced at: almost 2 years ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 1

Leonamrsm/Risk-Analysis-in-Public-Transport

Analysis of a dataset of traffic incidents using Spark SQL and graphs

Language: Jupyter Notebook - Size: 3.53 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

shalakasaraogi/apache-spark-pig-hive-work

This repository contains Apache Spark, Apache Hive, Apache Pig work

Language: PigLatin - Size: 813 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

Annbelbella/Home_Sales

Use of SparkSQL to determine key metrics about home sales data. Then you'll use Spark to create temporary views, partition the data, cache and uncache a temporary table, and verify that the table has been uncached.

Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

helioribeiro/Project_04_IOT_RealTime_Industrial_Analytics_with_Spark_and_Kafka

IOT Sensors RealTime Analysis with Apache Spark, Kafka, Python and SQL.

Language: HTML - Size: 6.59 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

kayannr/sportstats

Historical data analysis using SQL, Databricks, Python, PandaSQL, Pandas, and SQL Window functions. .

Language: Jupyter Notebook - Size: 34.2 MB - Last synced at: 4 months ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

jwkimani/big-data-insights-scala

personal solutions to big data problem scenarios using scala

Language: Scala - Size: 53.7 KB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 8 - Forks: 2

mdarm/map-reduce-project

Project on MapReduce for the Μ111 - Big Data Management course, NKUA, Spring 2023.

Language: TeX - Size: 3.7 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

xghan99/bigdata-assignments

This repository consists of code I wrote for CS4225 - Big Data Systems for Data Science

Language: Jupyter Notebook - Size: 3.64 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Hamim-Hussain/Home_Sales

Use SparkSQL to determine key metrics about home sales data. Then use Spark to create temporary views, partition the data, cache and uncache a temporary table, and verify that the table has been uncached.

Language: Jupyter Notebook - Size: 263 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

wrm244/depression_sparksql

bigdata_depression项目spark部分代码备份

Language: Scala - Size: 2.51 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

keramiozsoy/apache-spark-101

Size: 134 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

IcarusDB/SFaker

SFaker is one data generator.

Language: Java - Size: 178 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 1

Siddharth1989/ProspectiveTopUpCustomerPrediction

Developed a model/Spark ML pipeline stream to identify potential customers that may purchase top up services in the future.

Language: Jupyter Notebook - Size: 6.17 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

hadiezatpanah/Spark-Scala-Data-Pipeline

An End to End solution to read XML data from FTP server and process and import them into postgres relational database

Language: Scala - Size: 7.66 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

BhagiaSheri/apache-spark-SQL

Big Data Pipeline | Querying Data from Hive Table Phase

Language: Java - Size: 262 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

AndrewKuzmin/spark-structured-streaming-examples

Spark structured streaming examples with using of version 3.4.0

Language: Scala - Size: 1.06 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 25 - Forks: 14

Amir79Naziri/TwitterSentimentAnalysisWithSpark_Project

A sentiment analyzer using Spark ML library for Twitter Dataset

Language: Jupyter Notebook - Size: 13.7 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

emersonrafaels/pyspark

Spark: Examples

Language: Python - Size: 27.3 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

JLeigh101/Home-Sales

NU Bootcamp Module 22

Language: Jupyter Notebook - Size: 7.81 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

candidamg/Home_Sales

Determined the key metrics of home_sales data by using SparkSQL in Google Colab

Language: Jupyter Notebook - Size: 18.6 KB - Last synced at: 4 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

meriemnour/spark-_sql

Language: Jupyter Notebook - Size: 20.5 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

tmard/Home_Sales

Using SparkSQL to determine key metrics about home sales data.

Language: Jupyter Notebook - Size: 1.55 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

sajithrw/spark

Spark with Java

Language: Java - Size: 18.6 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

FarrukhSultani/Home_Sales

This project demonstrated the usage of SparkSQL to read, query, cache, and analyze home sales data, providing insights into average prices based on various criteria.

Language: Jupyter Notebook - Size: 1.25 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

samerelhousseini/Geospatial-Analysis-With-Spark

This is a data processing pipeline that implements an End-to-End Real-Time Geospatial Analytics and Visualization multi-component full-stack solution, using Apache Spark Structured Streaming, Apache Kafka, MongoDB Change Streams, Node.js, React, Uber's Deck.gl and React-Vis, and using the Massachusetts Bay Transportation Authority's (MBTA) APIs as a data source

Language: Python - Size: 12.4 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 4

pmb-7684/IBM-Data-Engineering-Professional-Certificate

Learning materials, assignments, and helpful resources for professional certification. Expected Completion June 2023

Size: 9.77 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Mathews-Tom/MSc-in-Machine-Learning-and-Artificial-Intelligence

Master of Science in Machine Learning & Artificial Intelligence - Indian Institute Technology Madras & Liverpool John Moores University

Language: Jupyter Notebook - Size: 2.12 GB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 6 - Forks: 7

mohankrishna02/SparkSQL

This project demonstrates how to use Spark SQL to execute SQL queries on structured data in Spark, and display the results in a tabular format using the show() method.

Language: Scala - Size: 24.4 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

CarolinaNicasio/APACHESPARK-PYSPARK

Size: 1.09 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

SuryakantKumar/Apache-Spark-with-Scala

This repository is a collection of Apache Spark concepts and its implementations in Scala. The repository is intended to serve as a resource for developers who want to learn Spark and Scala, and for those who want to expand their knowledge of these technologies.

Language: Jupyter Notebook - Size: 4.97 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

svngoku/Pyspark-Stream-kafka-TwitterAnalysis

Streaming Data from Twitter for Sentiment Analysis

Language: Jupyter Notebook - Size: 59.6 KB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 3 - Forks: 0

quintax96/FromHiveTableToScalaJunitTest

This project aims to read a hive table from a file (.txt/.csv) and create a text file (.txt) containing the table in Junit format for spark projects written in Scala language (dataFrame)

Language: Python - Size: 10.3 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

hams71/Salary_Analysis

Pulled data from different sites to perform salary analysis and visualize in Power BI

Language: Jupyter Notebook - Size: 344 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

strizhonov/parquet-comparator

A tool which allows to compare parquet datasets, placed in HDFS.

Language: Scala - Size: 9.77 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

vaibhav-vemula/Machine-Learning-with-Spark-Streaming

UE19CS322 Big Data course project

Language: Jupyter Notebook - Size: 73.2 KB - Last synced at: 12 days ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

taboola/ScORe

ScORe - Programmatic Schema On Read for Spark SQL, powered by Taboola

Language: Java - Size: 51.8 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 2

jowilf/big-data-showcase

This repository contains a project showcasing the use of Big Data technologies in processing and visualizing real-time data from an eCommerce electronics store using tools such as Apache Kafka, Spark Streaming, Spark SQL, HBase, and Plotly

Language: Java - Size: 2.7 MB - Last synced at: 4 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Subham2S/BigData-Engineering-Capstone-Project-1

BigData Engineering Capstone Project with Tech-stack : Linux, MySQL, sqoop, HDFS, Hive, Impala, SparkSQL, SparkML, git

Language: Python - Size: 15.2 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 6 - Forks: 0

eavilaes/qbeast-spark Fork of Qbeast-io/qbeast-spark

Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!

Language: Scala - Size: 16.9 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Wathon/data_engineering_with_python-track-datacamp

Data Engineer with Python lecture notes from #datacamp.

Language: Jupyter Notebook - Size: 59.7 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 26 - Forks: 22

victorskl/genomic-bigdata-spark

Genomic BigData Warehousing with Apache Spark and LakeHouse Architecture

Language: Jupyter Notebook - Size: 172 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 0

dhruv007patel/Impact-of-Covid-19-on-Aviation-Industry

This project analyzes the correlation between COVID-19 and the US aviation industry. By studying data on passenger/freight traffic and delays alongside COVID-19 trends, it provides insights into airline and passenger responses. The findings help airlines adapt to the pandemic's impact.

Language: Python - Size: 504 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0