GitHub topics: spark-sql
norbertolimonjr/KMeans-Clustering-Segmentation-Analysis
Online Retail Cassification for Marketing Segmentation Project using KMeans Clustering, Elbow Method and Silhouette Method for Validation
Language: Jupyter Notebook - Size: 53.4 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Thapep/ApacheSpark
Apache Spark project for Advanced Topics on Databases course
Language: Python - Size: 438 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

brianmslee/Home-Sales-Big-Data-with-PySpark-SQL
Home sales data analysis utilizing PySpark and Spark SQL
Language: Jupyter Notebook - Size: 3.91 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

ramapilli16/CCA175-PySpark-Practice-with-solutions
CCA175-PySpark-Practice-with-solutions
Size: 20.5 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 2

AH-Yussef/Health-Monitor-Big-Data-System
A Health Monitor to simulate receiving and processing large amounts of health metrics from many clients with the goal of efficiently finding aggregate statistics
Language: Java - Size: 319 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

emrekutlug/getting-started-with-pyspark
In this tutorial, I explained SparkContext by using map and filter methods with Lambda functions in Python and created RDD from object and external files, transformations and actions on RDD and pair RDD, PySpark DataFrame from RDD and external files, used sql queries with DataFrames by using Spark SQL, used machine learning with PySpark MLlib.
Language: Jupyter Notebook - Size: 8.58 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 5 - Forks: 6

the-timoye/spark-examples
Language: Python - Size: 14.6 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 2

Melchizedek13/spark-sql-parser
Parsing sql queries to get table names by using the CatalystSqlParser.
Language: Java - Size: 13.7 KB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 2

beyhangl/StructuredSparkStreaming
Spark Streaming with Kafka using Scala
Language: Scala - Size: 182 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

IgorBerman/spark-bucketing
technique to optimise or remove shuffles
Size: 223 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

mihir09/BigData-Analysis
Scrapped and Analyzed Twitter data using Spark. Run Spark queries on Millions of tweets and trained models for sentiment analysis.
Language: JavaScript - Size: 52.5 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Heisenberg0203/Apache_Spark
Apache Spark Projects :-From beginners to advanced level
Language: Java - Size: 64.5 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

akarsh3007/MusicMillionData
HDF5 files in spark, MusicMillionData
Language: Scala - Size: 14.8 MB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

emso-exe/Comercio_eletronico_brasileiro
Projeto de análise de dados do comércio eletrônico brasileiro disponibilizado pela Olist via plataforma Kaggle.
Language: Jupyter Notebook - Size: 41.5 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

viennadatasciencegroup/kf-2017-11-09-R-and-spark
Integrating R into the big data ecosystem using sparklyR
Size: 568 KB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

phphoebe/Udacity-Data-Engineering-with-AWS
Design data models, build data warehouses, data lakes & lakehouse, automate data pipelines - SQL | NoSQL | AWS | Spark | Airflow
Language: Jupyter Notebook - Size: 26.5 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 1

jwsmai/ScalaTools
This project provides Apache Spark SQL, Flink DataStream API examples in Scala language
Language: Scala - Size: 3.19 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

PeterSchuld/Sparkify
Capstone Project in the Udacity Data Scientist Nanodegree program. We manipulate large and realistic datasets with Spark to engineer relevant features for predicting churn. We'll learn how to use Spark MLlib to build machine learning models with large datasets, far beyond what could be done with non-distributed technologies like scikit-learn.
Language: HTML - Size: 2.44 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

zrlio/albis
Albis: High-Performance File Format for Big Data Systems
Size: 1.02 MB - Last synced at: almost 2 years ago - Pushed at: about 7 years ago - Stars: 21 - Forks: 3

karttiik/Big-Data-with-Spark-and-Scala
Big Data with Scala and Spark, building Spark session to capture useful insights.
Language: Scala - Size: 3.91 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

sanjosh/scala
Language: Scala - Size: 75.2 KB - Last synced at: almost 2 years ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 2

gdelgador/SparkScalaExamples
Examples of Scala basic and Spark
Language: Scala - Size: 8.1 MB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

hitesh1292/US-Car-Accidents
Big Data Analytics Project using Apache Spark for Predicting Severity of Car Accidents in the USA
Language: Jupyter Notebook - Size: 1.54 MB - Last synced at: 3 months ago - Pushed at: about 5 years ago - Stars: 2 - Forks: 0

hanydief/Home_Sales
Using knowledge of SparkSQL to determine key metrics about home sales data. Then using Spark to create temporary views, partition the data, cache and uncache a temporary table, and verify that the table has been uncached.
Language: Jupyter Notebook - Size: 9.77 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

xtutran/spark-tutor
Using spark-sql & spark-mllib to tackle Titanic & Movie Recomendation
Language: Scala - Size: 112 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

morbvel/Ghost
Ghost game built in Scala with Spark-SQL
Language: Scala - Size: 3.91 KB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

morbvel/Flight-control
Brief simulation of flights control built in Spark-SQL with Scala
Language: Scala - Size: 1.95 KB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

mehroosali/s3-redshift-batch-etl-pipeline
Built functional python ETL script with functions that initialized spark clusters using pyspark library to extract songs stored in S3 bucket. Partitioned songs data by year and artist_id and compressed in parquet output files to increase load performance. Used the overwrite mode in spark to ensure every new run of ELT script is overwritten in the data lake to avoid duplicates. Orchestrated ELT data pipeline that extracts from S3, loads in redshift for transformation and loads output back to S3. Used hooks in airflow to make connection credentials configurable in order to separate access rights from code base for security. Used operators to execute loading and transformation scripts for redshift with airflow DAG.
Language: Python - Size: 944 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 3

Dragon1573/Online-Works 📦
泰迪在线实习备份
Language: Scala - Size: 2.24 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

temcavanagh/Spark_ETL_Pipeline
Building an ETL pipeline using Apache Spark (spark-sql) for use in predictive modelling.
Language: Python - Size: 0 Bytes - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

gcdev373/example-spark-datasourcev2
A very simple Java implementation of the Spark DataSourceV2 API.
Language: Java - Size: 9.77 KB - Last synced at: 11 days ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 0

Neuw84/spark-drools Fork of Pkrish15/spark-drools
Spark integration with Drools for ETL purposes
Language: Java - Size: 92.8 KB - Last synced at: almost 2 years ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

bruno-uy/IBMCodeMVD2018
Hands-on Data Science en IBM Code Montevideo 2018
Language: Jupyter Notebook - Size: 1.8 MB - Last synced at: almost 2 years ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0

Madhusudhan985/LendingClub-Big-Data-Analytics
A data transformation solution for LendingClub, encompassing data cleansing, feature engineering, and loan scoring to facilitate data-driven decision-making.
Language: Scala - Size: 47.9 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

anjijava16/Spark_MultiFiles_Insert_Oracle
Multi Files Insert into Oracle using Spark Scala
Language: Scala - Size: 26.4 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

anjijava16/kafka_springboot_cassandra_utils
End to End Usecase Swagger UI -->Spring MicroServices -->Kafka -->Spark Consumer -->Cassandra DB
Language: Java - Size: 594 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

dashish333/Spark
Language: Jupyter Notebook - Size: 1.34 MB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

skp33/pointinpolygon
This library takes geoJson (only multipolygon type) and check weather point(lat, lon) is inside polygon or not and return property of matched polygon.
Language: Scala - Size: 3.73 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 1

Chuka-J-Uzo/Data-Streaming-ETL-IUBH
Data-Streaming-ETL-IUBH repository is developed as a real-time streaming application that captures data from a python app that simulates streamed data from the movement of a truck as its source and ingests it into a data store for processing, analysis and visualization.
Language: Python - Size: 34 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

bekirduran/credit-card-tracing
detecting not using virtual credit card, much process which time period and analysing.. on Cloud. Then write MongoDB for more profit :)
Language: Java - Size: 401 KB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

sunnn/TableAligner
Construct Source files as per the target files in Spark using Datframe api and spark
Language: Scala - Size: 63.5 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

NashTech-Labs/Sparkathon
A library having Java and Scala examples for Spark 2.x
Language: Java - Size: 113 MB - Last synced at: 3 months ago - Pushed at: over 8 years ago - Stars: 7 - Forks: 9

javiaspiroz/football-predictions Fork of Juanal07/football-predictions
Football shots visualization and player prices prediction. Built with Spark for distributed computation and Streamlit for the frontend
Size: 29.2 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

javiaspiroz/billetajo Fork of Juanal07/billetajo
KPIs for a bank using the 2015 almeria payments dataset. Built with Spark for distributed computation and Streamlit for the frontend
Size: 10.7 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

jalpan-randeri/spark-notes
Spark Notes
Language: HTML - Size: 181 KB - Last synced at: almost 2 years ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

ziyaocui/732-project
Yelp Business Analysis and Location Recommendation
Language: Python - Size: 60.3 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 2

gavalle94/Songs-Recommender
Recommendation System written in Python, using the pySpark framework and other Data Science libraries
Language: HTML - Size: 5.23 MB - Last synced at: almost 2 years ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 1

gavalle94/Flights-dataset
Analysis of a dataset of flights, using the SparkSQL framework and extra web scraping techniques
Language: HTML - Size: 2.95 MB - Last synced at: almost 2 years ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0

RSummerSchool/R-for-HPC-and-big-data
Slides and lab material for the talk R for HPC and big data at http://rsummer.data-analysis.at
Size: 3.68 MB - Last synced at: almost 2 years ago - Pushed at: almost 8 years ago - Stars: 7 - Forks: 4

dushibaiyu/kotlin-spark-sql-es-read-example
use spark sql read data from Elasticsearch use kotlin
Language: Kotlin - Size: 8.79 KB - Last synced at: almost 2 years ago - Pushed at: almost 8 years ago - Stars: 7 - Forks: 0

dhaval201279/my-oreilly-learning-spark-v2
Important notes and sample code as part of reading 'Learning Spark 2nd Edition'
Language: Scala - Size: 9.95 MB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 1

MortalKommit/DE_04_azure_data_lakehouse
Language: Jupyter Notebook - Size: 550 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

HeliaHashemipour/Hadoop-Spark
Third homework of CloudComputing - Fall 2022
Language: Jupyter Notebook - Size: 50.3 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

arunteja1203/Aruntejakotla-2119770
Big Data Technologies - Assignment 2 Student ID 2119770
Language: Jupyter Notebook - Size: 7.43 MB - Last synced at: 1 day ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Wendy-hub/MusicPrediction
Music prediction using PySpark
Language: HTML - Size: 2.83 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Lucass97/FlightAnalysis
This project implemented a lambda architecture for analyzing domestic flight data in the US from 2009 to 2020. It used Apache Spark for batch processing, Spark Streaming for real-time analysis, and SVM models to predict flight cancellations and delays, with Docker for cluster management and Grafana for real-time visualization.
Language: Jupyter Notebook - Size: 5.66 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 1

globosc/bigdata
Análisis al Proyecto GDELT con herramientas bigdata basadas den hadoop en nube Microsoft Azure
Language: Python - Size: 3.56 MB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

Dheeraj2444/spark
Learning PySpark
Language: Jupyter Notebook - Size: 18.6 KB - Last synced at: almost 2 years ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 1

Leonamrsm/Risk-Analysis-in-Public-Transport
Analysis of a dataset of traffic incidents using Spark SQL and graphs
Language: Jupyter Notebook - Size: 3.53 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

shalakasaraogi/apache-spark-pig-hive-work
This repository contains Apache Spark, Apache Hive, Apache Pig work
Language: PigLatin - Size: 813 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

Annbelbella/Home_Sales
Use of SparkSQL to determine key metrics about home sales data. Then you'll use Spark to create temporary views, partition the data, cache and uncache a temporary table, and verify that the table has been uncached.
Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

helioribeiro/Project_04_IOT_RealTime_Industrial_Analytics_with_Spark_and_Kafka
IOT Sensors RealTime Analysis with Apache Spark, Kafka, Python and SQL.
Language: HTML - Size: 6.59 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

kayannr/sportstats
Historical data analysis using SQL, Databricks, Python, PandaSQL, Pandas, and SQL Window functions. .
Language: Jupyter Notebook - Size: 34.2 MB - Last synced at: 4 months ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

jwkimani/big-data-insights-scala
personal solutions to big data problem scenarios using scala
Language: Scala - Size: 53.7 KB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 8 - Forks: 2

mdarm/map-reduce-project
Project on MapReduce for the Μ111 - Big Data Management course, NKUA, Spring 2023.
Language: TeX - Size: 3.7 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

xghan99/bigdata-assignments
This repository consists of code I wrote for CS4225 - Big Data Systems for Data Science
Language: Jupyter Notebook - Size: 3.64 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Hamim-Hussain/Home_Sales
Use SparkSQL to determine key metrics about home sales data. Then use Spark to create temporary views, partition the data, cache and uncache a temporary table, and verify that the table has been uncached.
Language: Jupyter Notebook - Size: 263 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

wrm244/depression_sparksql
bigdata_depression项目spark部分代码备份
Language: Scala - Size: 2.51 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

keramiozsoy/apache-spark-101
Size: 134 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

IcarusDB/SFaker
SFaker is one data generator.
Language: Java - Size: 178 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 1

Siddharth1989/ProspectiveTopUpCustomerPrediction
Developed a model/Spark ML pipeline stream to identify potential customers that may purchase top up services in the future.
Language: Jupyter Notebook - Size: 6.17 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

hadiezatpanah/Spark-Scala-Data-Pipeline
An End to End solution to read XML data from FTP server and process and import them into postgres relational database
Language: Scala - Size: 7.66 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

BhagiaSheri/apache-spark-SQL
Big Data Pipeline | Querying Data from Hive Table Phase
Language: Java - Size: 262 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

AndrewKuzmin/spark-structured-streaming-examples
Spark structured streaming examples with using of version 3.4.0
Language: Scala - Size: 1.06 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 25 - Forks: 14

Amir79Naziri/TwitterSentimentAnalysisWithSpark_Project
A sentiment analyzer using Spark ML library for Twitter Dataset
Language: Jupyter Notebook - Size: 13.7 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

emersonrafaels/pyspark
Spark: Examples
Language: Python - Size: 27.3 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

JLeigh101/Home-Sales
NU Bootcamp Module 22
Language: Jupyter Notebook - Size: 7.81 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

candidamg/Home_Sales
Determined the key metrics of home_sales data by using SparkSQL in Google Colab
Language: Jupyter Notebook - Size: 18.6 KB - Last synced at: 4 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

meriemnour/spark-_sql
Language: Jupyter Notebook - Size: 20.5 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

tmard/Home_Sales
Using SparkSQL to determine key metrics about home sales data.
Language: Jupyter Notebook - Size: 1.55 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

sajithrw/spark
Spark with Java
Language: Java - Size: 18.6 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

FarrukhSultani/Home_Sales
This project demonstrated the usage of SparkSQL to read, query, cache, and analyze home sales data, providing insights into average prices based on various criteria.
Language: Jupyter Notebook - Size: 1.25 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

samerelhousseini/Geospatial-Analysis-With-Spark
This is a data processing pipeline that implements an End-to-End Real-Time Geospatial Analytics and Visualization multi-component full-stack solution, using Apache Spark Structured Streaming, Apache Kafka, MongoDB Change Streams, Node.js, React, Uber's Deck.gl and React-Vis, and using the Massachusetts Bay Transportation Authority's (MBTA) APIs as a data source
Language: Python - Size: 12.4 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 4

pmb-7684/IBM-Data-Engineering-Professional-Certificate
Learning materials, assignments, and helpful resources for professional certification. Expected Completion June 2023
Size: 9.77 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Mathews-Tom/MSc-in-Machine-Learning-and-Artificial-Intelligence
Master of Science in Machine Learning & Artificial Intelligence - Indian Institute Technology Madras & Liverpool John Moores University
Language: Jupyter Notebook - Size: 2.12 GB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 6 - Forks: 7

mohankrishna02/SparkSQL
This project demonstrates how to use Spark SQL to execute SQL queries on structured data in Spark, and display the results in a tabular format using the show() method.
Language: Scala - Size: 24.4 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

CarolinaNicasio/APACHESPARK-PYSPARK
Size: 1.09 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

SuryakantKumar/Apache-Spark-with-Scala
This repository is a collection of Apache Spark concepts and its implementations in Scala. The repository is intended to serve as a resource for developers who want to learn Spark and Scala, and for those who want to expand their knowledge of these technologies.
Language: Jupyter Notebook - Size: 4.97 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

svngoku/Pyspark-Stream-kafka-TwitterAnalysis
Streaming Data from Twitter for Sentiment Analysis
Language: Jupyter Notebook - Size: 59.6 KB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 3 - Forks: 0

quintax96/FromHiveTableToScalaJunitTest
This project aims to read a hive table from a file (.txt/.csv) and create a text file (.txt) containing the table in Junit format for spark projects written in Scala language (dataFrame)
Language: Python - Size: 10.3 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

hams71/Salary_Analysis
Pulled data from different sites to perform salary analysis and visualize in Power BI
Language: Jupyter Notebook - Size: 344 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

strizhonov/parquet-comparator
A tool which allows to compare parquet datasets, placed in HDFS.
Language: Scala - Size: 9.77 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

vaibhav-vemula/Machine-Learning-with-Spark-Streaming
UE19CS322 Big Data course project
Language: Jupyter Notebook - Size: 73.2 KB - Last synced at: 12 days ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

taboola/ScORe
ScORe - Programmatic Schema On Read for Spark SQL, powered by Taboola
Language: Java - Size: 51.8 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 2

jowilf/big-data-showcase
This repository contains a project showcasing the use of Big Data technologies in processing and visualizing real-time data from an eCommerce electronics store using tools such as Apache Kafka, Spark Streaming, Spark SQL, HBase, and Plotly
Language: Java - Size: 2.7 MB - Last synced at: 4 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Subham2S/BigData-Engineering-Capstone-Project-1
BigData Engineering Capstone Project with Tech-stack : Linux, MySQL, sqoop, HDFS, Hive, Impala, SparkSQL, SparkML, git
Language: Python - Size: 15.2 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 6 - Forks: 0

eavilaes/qbeast-spark Fork of Qbeast-io/qbeast-spark
Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
Language: Scala - Size: 16.9 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Wathon/data_engineering_with_python-track-datacamp
Data Engineer with Python lecture notes from #datacamp.
Language: Jupyter Notebook - Size: 59.7 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 26 - Forks: 22

victorskl/genomic-bigdata-spark
Genomic BigData Warehousing with Apache Spark and LakeHouse Architecture
Language: Jupyter Notebook - Size: 172 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 0

dhruv007patel/Impact-of-Covid-19-on-Aviation-Industry
This project analyzes the correlation between COVID-19 and the US aviation industry. By studying data on passenger/freight traffic and delays alongside COVID-19 trends, it provides insights into airline and passenger responses. The findings help airlines adapt to the pandemic's impact.
Language: Python - Size: 504 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0
