Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: pyspark
baaraban/sparkyShinyIndianaJones
Tracking flow of historical artifacts using open data sources
Language: Jupyter Notebook - Size: 18 MB - Last synced: about 5 hours ago - Pushed: over 4 years ago - Stars: 1 - Forks: 0
Tiago-B-C-Reis/Apache_Spark
Spark with Python, including Spark Streaming, Machine Learning, Spark DataFrames and more.
Language: Jupyter Notebook - Size: 44.2 MB - Last synced: about 6 hours ago - Pushed: about 7 hours ago - Stars: 0 - Forks: 0
groda/big_data
Tutorials on Big Data essentials: Hadoop, MapReduce, Spark.
Language: Jupyter Notebook - Size: 38.1 MB - Last synced: about 7 hours ago - Pushed: about 7 hours ago - Stars: 61 - Forks: 23
ibis-project/ibis
the portable Python dataframe library
Language: Python - Size: 72.3 MB - Last synced: about 7 hours ago - Pushed: about 12 hours ago - Stars: 4,261 - Forks: 530
mikan-senpai/sales-analysis
Python , PySpark , Big-Data
Language: Jupyter Notebook - Size: 4.23 MB - Last synced: about 12 hours ago - Pushed: about 12 hours ago - Stars: 0 - Forks: 0
nguyen-tho/DataAnalytics
Data analytics lecture using python and pyspark
Language: Jupyter Notebook - Size: 48.8 MB - Last synced: about 11 hours ago - Pushed: about 12 hours ago - Stars: 1 - Forks: 0
microsoft/SynapseML
Simple and Distributed Machine Learning
Language: Scala - Size: 139 MB - Last synced: about 6 hours ago - Pushed: 6 days ago - Stars: 4,975 - Forks: 815
pdemeulenaer/my-ds-documentation
Personal repo containing tips and tricks I have gone through so far, majorly on python, pyspark, scikit-learn and some other stuff
Language: Makefile - Size: 41.1 MB - Last synced: about 17 hours ago - Pushed: 1 day ago - Stars: 2 - Forks: 1
javiizz/SparkProjects-Healthcare_Analysis
Language: Jupyter Notebook - Size: 12.8 MB - Last synced: about 4 hours ago - Pushed: about 22 hours ago - Stars: 1 - Forks: 0
javiizz/SparkProjects-EarthQuake_Analysis
Earthquake Analysis using PySpark
Language: Jupyter Notebook - Size: 6.63 MB - Last synced: about 4 hours ago - Pushed: about 23 hours ago - Stars: 1 - Forks: 0
rajatkrishna/nlp-benchspark
Benchmark inference of custom and pre-trained NLP models with Spark NLP.
Language: Python - Size: 16.6 KB - Last synced: about 23 hours ago - Pushed: about 23 hours ago - Stars: 1 - Forks: 0
Digital-Defiance/nlp-metaformer
An ablation study on the transformer network for Natural Language Processing
Language: Rust - Size: 25.3 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 3 - Forks: 0
uber/petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Language: Python - Size: 2.69 MB - Last synced: about 16 hours ago - Pushed: 5 months ago - Stars: 1,754 - Forks: 281
josephmachado/efficient_data_processing_spark
Code for "Efficient Data Processing in Spark" Course
Language: Python - Size: 23.8 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 20 - Forks: 5
javiizz/PySparkDataFrame-QueryExplorations
Language: Jupyter Notebook - Size: 8.79 KB - Last synced: about 4 hours ago - Pushed: 23 days ago - Stars: 0 - Forks: 0
rickyschools/dltflow
A library for authoring DLT pipelines via meta-programming patterns and deploying to Databricks workspaces.
Language: Python - Size: 1.68 MB - Last synced: about 19 hours ago - Pushed: 1 day ago - Stars: 0 - Forks: 0
KevinShindel/MachineLearning
Pandas, Sci-kit, SparkML
Language: Jupyter Notebook - Size: 42.4 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 0 - Forks: 0
databrickslabs/dbldatagen
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
Language: Python - Size: 10 MB - Last synced: about 1 hour ago - Pushed: about 2 hours ago - Stars: 267 - Forks: 51
GoogleCloudPlatform/dataproc-templates
Dataproc templates and pipelines for solving simple in-cloud data tasks
Language: Python - Size: 18.6 MB - Last synced: 1 day ago - Pushed: 3 days ago - Stars: 111 - Forks: 84
Tytrox/f1-driver-ranking
Ranks f1 drivers by comparing their historical performance in the same machinery
Language: Python - Size: 5.92 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 1 - Forks: 0
mitchelllisle/sparkdantic
✨ A Pydantic to PySpark schema library
Language: Python - Size: 2.05 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 29 - Forks: 6
apache/linkis
Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
Language: Java - Size: 86 MB - Last synced: 1 day ago - Pushed: 12 days ago - Stars: 3,235 - Forks: 1,132
thejungwon/dat500-19-sample
UiS DAT500 sample code
Language: Jupyter Notebook - Size: 631 KB - Last synced: 2 days ago - Pushed: about 5 years ago - Stars: 0 - Forks: 0
kryvokhyzha/spark-code-examples
This repository contains some examples of using spark and pyspark
Language: Jupyter Notebook - Size: 38.2 MB - Last synced: 2 days ago - Pushed: over 2 years ago - Stars: 2 - Forks: 2
kryvokhyzha/databricks-data-engineer
This repository contains materials and resources for the Databricks Data Engineer course
Language: Jupyter Notebook - Size: 9.12 MB - Last synced: 2 days ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0
ev2900/Glue_Aggregate_Small_Files
PySpark script to aggregate small parquet files in a prefix into larger files. Designed to be run on AWS Glue
Language: Python - Size: 116 KB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 0 - Forks: 0
ev2900/Glue_Examples
PySpark code samples designed for AWS Glue
Language: Python - Size: 34.2 KB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 1 - Forks: 0
awesome-spark/awesome-spark
A curated list of awesome Apache Spark packages and resources.
Language: Shell - Size: 209 KB - Last synced: about 22 hours ago - Pushed: about 1 month ago - Stars: 1,620 - Forks: 323
astrolabsoftware/spark-kernel-nersc
Create custom kernels for using pyspark notebooks at NERSC
Language: Python - Size: 1.27 MB - Last synced: 2 days ago - Pushed: over 5 years ago - Stars: 2 - Forks: 2
mathewsrc/machine-learning-monitoring-with-evidently
ML Monitoring with EvidentlyAI
Language: Jupyter Notebook - Size: 23.1 MB - Last synced: 2 days ago - Pushed: 4 months ago - Stars: 0 - Forks: 0
commoncrawl/cc-pyspark
Process Common Crawl data with Python and Spark
Language: Python - Size: 127 KB - Last synced: 2 days ago - Pushed: about 1 month ago - Stars: 379 - Forks: 84
Spindle-Health/carduus
PySpark implementation of the Open Privacy Preserving Record Linkage (OPPRL) specification.
Language: Python - Size: 1.74 MB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 2 - Forks: 0
jademene/International-Students-in-the-US-Analysis
Data Analysis and Visualisation | Databricks
Language: Jupyter Notebook - Size: 1.24 MB - Last synced: 1 day ago - Pushed: 2 days ago - Stars: 0 - Forks: 0
YeonwooSung/DevOpsMisc
Miscellaneous codes and writings for DevOps
Language: Jupyter Notebook - Size: 1.96 MB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 1 - Forks: 0
jademene/Titanic-Survival-Classification-Model Fork of JacopoBulgarelli/Cloud-Cognitive-Services
Machine Learning | Classification task
Language: Jupyter Notebook - Size: 1.48 MB - Last synced: 1 day ago - Pushed: 2 days ago - Stars: 0 - Forks: 0
MrPowers/quinn
pyspark methods to enhance developer productivity 📣 👯 🎉
Language: Python - Size: 1.94 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 580 - Forks: 91
JohnSnowLabs/spark-nlp
State of the Art Natural Language Processing
Language: Scala - Size: 1.47 GB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 3,701 - Forks: 701
jmcurbelo/pyspark-ingenieria-de-datos
Este repositorio contiene el material del curso de Udemy Big Data y Spark: ingeniería de datos con Python y pyspark. En este curso, aprenderás a utilizar las herramientas y técnicas necesarias para trabajar con grandes conjuntos de datos utilizando la librería pyspark.
Language: Python - Size: 27.3 KB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 17 - Forks: 33
capitalone/datacompy
Pandas and Spark DataFrame comparison for humans and more!
Language: Python - Size: 9.11 MB - Last synced: about 7 hours ago - Pushed: about 11 hours ago - Stars: 394 - Forks: 122
HariSekhon/DevOps-Python-tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Language: Python - Size: 3 MB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 731 - Forks: 334
F-Mangini/Data-Driven-Marketing
The objective is to develop a data-driven marketing strategy for the upcoming year to maximize product sales. This involves analysing existing data to identify patterns and trends, thereby informing a plan that aligns with and anticipates market dynamics to enhance sales performance.
Language: Jupyter Notebook - Size: 6.34 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 0 - Forks: 0
ArwaEiad/TMDB-Project
This project focuses on analyzing movie data using Pyspark tailored for efficient data processing on Hadoop Distributed File System (HDFS)
Language: Jupyter Notebook - Size: 9.77 KB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 0 - Forks: 0
hadarsharon/compars
DataFrame comparison done right, powered by Rust with polars (AKA the bear-agnostic 🐻 🐼 🐨 🐻❄️ DataFrame comparison library)
Language: Python - Size: 36.1 KB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 0 - Forks: 0
josephmachado/docker_for_data_engineers
Code for blog at: https://www.startdataengineering.com/post/docker-for-de/
Language: C - Size: 561 KB - Last synced: 2 days ago - Pushed: 10 days ago - Stars: 15 - Forks: 7
aksh-patel1/Big-Data-Processing_Parallelize-K-means
Implemented the parallelized version of k-means clustering algorithm in Spark and assess its efficiency using a real-world dataset.
Language: Jupyter Notebook - Size: 4.37 MB - Last synced: 4 days ago - Pushed: 5 months ago - Stars: 1 - Forks: 0
alejandronotario/LDA-Topic-Modeling
Language: Java - Size: 9.57 MB - Last synced: 4 days ago - Pushed: over 5 years ago - Stars: 6 - Forks: 3
prakash-aryan/MicroDataWarehouse
MicroDataWarehouse is an ETL pipeline built with Python, PySpark, and SQLite to extract, transform, and load data, with Metabase for data exploration and visualization.
Language: Python - Size: 234 KB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 0 - Forks: 0
canimus/cuallee
Possibly the fastest DataFrame-agnostic quality check library in town.
Language: Python - Size: 1.71 MB - Last synced: 2 days ago - Pushed: 3 days ago - Stars: 109 - Forks: 12
sarathchandrikak/Data-Projects
Collection of data analysis and data engineering projects
Language: Jupyter Notebook - Size: 2.63 MB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 2 - Forks: 1
kevinndungu-source/Amazon_EMR_Demonstration_Resources
Reposits the resources used in this project: EMR on EC2 Cluster.
Language: Jupyter Notebook - Size: 557 KB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 0 - Forks: 0
josephmachado/data_engineering_best_practices
Sample project to demonstrate data engineering best practices
Language: Python - Size: 644 KB - Last synced: 2 days ago - Pushed: 3 months ago - Stars: 132 - Forks: 17
noobpk/gemini-web-vulnerability-detection
Gemini-Web Vulnerability Detection (G-WVD) detecting web application vulnerabilities with deep learning
Language: Python - Size: 50.8 KB - Last synced: 6 days ago - Pushed: 6 days ago - Stars: 5 - Forks: 0
astrolabsoftware/fink-science
Define your science modules to add values to Fink alerts!
Language: Jupyter Notebook - Size: 660 MB - Last synced: 2 days ago - Pushed: 3 days ago - Stars: 10 - Forks: 14
astrolabsoftware/fink-filters
Define your filters to create your alert stream in Fink!
Language: Python - Size: 38 MB - Last synced: 2 days ago - Pushed: 6 days ago - Stars: 1 - Forks: 5
arunp77/Job-Market-project
InsightfulRecruit: Unveiling the Job Market Landscape through Data Engineering
Language: HTML - Size: 6.4 MB - Last synced: 6 days ago - Pushed: 7 days ago - Stars: 1 - Forks: 0
henriqueoelze/MapReduce-trab-puc
Trabalho feito no curso de pós gradução para a disciplina de Processamento Distribuido - PUC MG
Language: Python - Size: 5.86 KB - Last synced: 7 days ago - Pushed: almost 7 years ago - Stars: 0 - Forks: 0
guidok91/spark-movies-etl
Spark data pipeline that processes movie ratings data.
Language: Python - Size: 2.54 MB - Last synced: 6 days ago - Pushed: 7 days ago - Stars: 25 - Forks: 12
mohankrishna02/interview-scenerios-spark-sql
This repository focuses on providing interview scenario questions that I have encountered during interviews. The questions are designed to simulate real-world scenarios and test your problem-solving and technical skills. By exploring these scenarios, you can gain insights into common interview topics and prepare yourself for similar challenges.
Language: Scala - Size: 249 KB - Last synced: 6 days ago - Pushed: 7 days ago - Stars: 0 - Forks: 2
riolaf05/spark-elasticsearch-recommendation
Recommendation system using Alternating Least Squares(ALS) and Cosine Similarity on PySpark and Elasticsearch
Language: Jupyter Notebook - Size: 8.79 KB - Last synced: 7 days ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0
CodeWithKriz/data-engineering
repository to maintain big data tutorials
Language: Python - Size: 16.6 KB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 0 - Forks: 0
pregismond/data-analysis-using-spark
Final Project Submission: Data Analysis using Spark
Language: Jupyter Notebook - Size: 20.5 KB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 0 - Forks: 0
jamestiotio/dbsys
SUTD 2021 50.043 Database and Big Data Systems Code Dump
Language: Java - Size: 69.7 MB - Last synced: 8 days ago - Pushed: almost 2 years ago - Stars: 1 - Forks: 3
CrossNox/7506-OD2
Recursos para 7506 (FIUBA)
Language: HTML - Size: 13.8 MB - Last synced: 8 days ago - Pushed: about 2 years ago - Stars: 7 - Forks: 8
FranzDiebold/advent-of-code-2021 📦
Solutions for Advent of Code 2021 in (Py)Spark
Language: Jupyter Notebook - Size: 22.5 KB - Last synced: 8 days ago - Pushed: over 2 years ago - Stars: 1 - Forks: 0
bhattbhavesh91/pyspark-basic-tutorial
A small walk through on how we can use PySpark with Google Colab
Language: Jupyter Notebook - Size: 22.5 KB - Last synced: 8 days ago - Pushed: over 4 years ago - Stars: 8 - Forks: 10
Ashutosh27ind/pySparkAirlinesDataAnalysis
PySpark Data Analysis for airlines dataset for files hosted on HDFX=S.
Language: Jupyter Notebook - Size: 2.82 MB - Last synced: 8 days ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 0
Ashutosh27ind/pySparkNYCParkingTickets
Attempt to scientifically analyze the phenomenon of increased traffic violation tickets issued by the NYC Police Department.
Language: Jupyter Notebook - Size: 11.7 KB - Last synced: 8 days ago - Pushed: about 4 years ago - Stars: 0 - Forks: 0
Ashutosh27ind/pySparkMLAnalysis
PySpark ML Heart and Advertisement Data Analysis
Language: Jupyter Notebook - Size: 8.79 KB - Last synced: 8 days ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 0
chezou/mecab-on-pyspark 📦
Example code for distributing Python packages on Spark cluster
Language: Python - Size: 6.84 KB - Last synced: 8 days ago - Pushed: almost 7 years ago - Stars: 3 - Forks: 2
chezou/cdsw-serve-docker 📦
REST API server example with Docker for Cloudera Data Science Workbench
Size: 2.93 KB - Last synced: 8 days ago - Pushed: over 6 years ago - Stars: 5 - Forks: 2
naiborhujosua/Machine-Learning-with-pyspark
This Note is a repository about my journey learning pyspark and the implementation in Machine Learning
Size: 10.5 MB - Last synced: 8 days ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0
UMassCDS/IHOP-Reddit
The Center for Data Science repository for the International Hate Observatory Project and analyzing Reddit. This produces the models used in RedditMap.social.
Language: Jupyter Notebook - Size: 113 MB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 1 - Forks: 1
suryadev99/pyspark_bank_data_pipeline
end-to-end deep learning pipeline that runs on Spark
Language: Jupyter Notebook - Size: 195 KB - Last synced: 7 days ago - Pushed: 8 days ago - Stars: 0 - Forks: 0
raineydavid/big-data-processing
Big Data Processing Notes from Masters in Big Data Science
Size: 13.7 MB - Last synced: 8 days ago - Pushed: about 5 years ago - Stars: 0 - Forks: 0
archivesunleashed/aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Language: Scala - Size: 39.5 MB - Last synced: 2 days ago - Pushed: 2 months ago - Stars: 133 - Forks: 33
rupeshtiwari/SparkStreamingInPython
spark python
Language: Python - Size: 218 KB - Last synced: 8 days ago - Pushed: about 2 years ago - Stars: 1 - Forks: 0
awesome-spark/spark-gotchas 📦
Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
Size: 188 KB - Last synced: 3 days ago - Pushed: almost 7 years ago - Stars: 355 - Forks: 82
saikumarsuvanam/BigData
Hadoop,MachineLearningAlgos,Spark,Pig,Hive
Language: Java - Size: 4.37 MB - Last synced: 9 days ago - Pushed: over 6 years ago - Stars: 0 - Forks: 1
NHSDigital/mps_diagnostics
Interpretable metadata for the results of NHS England record linkage
Language: Python - Size: 537 KB - Last synced: 8 days ago - Pushed: 9 days ago - Stars: 1 - Forks: 0
G-Research/spark-extension
A library that provides useful extensions to Apache Spark and PySpark.
Language: Scala - Size: 796 KB - Last synced: 8 days ago - Pushed: 13 days ago - Stars: 172 - Forks: 28
kevinschaich/pyspark-cheatsheet
🐍 Quick reference guide to common patterns & functions in PySpark.
Size: 49.8 KB - Last synced: 8 days ago - Pushed: about 1 year ago - Stars: 343 - Forks: 115
Shivabajelan/Home_Sales
This project analyses home sales data using PySpark SQL. It involves creating a temporary table, running queries, and performing caching and partitioning. The final step involves uncaching and verifying the temporary table.
Language: Jupyter Notebook - Size: 38.1 KB - Last synced: 9 days ago - Pushed: 10 days ago - Stars: 0 - Forks: 0
OmarNouih/Twitter-Streams
Real-Time Sentiment Analysis on Twitter Streams is a web application that categorizes tweets into sentiments like Negative, Positive, Neutral, or Irrelevant. Built using Apache Kafka , Spark and PySpark ML models, it offers real-time analysis capabilities.
Language: Jupyter Notebook - Size: 3.3 MB - Last synced: 9 days ago - Pushed: 10 days ago - Stars: 0 - Forks: 0
supersjgk/Marketing_Campaign_Analysis
A Data Science project for Marketing Campaign Analysis
Language: Jupyter Notebook - Size: 2.14 MB - Last synced: 10 days ago - Pushed: 10 days ago - Stars: 0 - Forks: 0
kernel-loophole/distributed-computing
Size: 3.91 KB - Last synced: 10 days ago - Pushed: about 2 years ago - Stars: 0 - Forks: 0
deepjyotiroy079/twitter-streaming
Counting number of hash tags from live stream of tweets from twitter.
Language: Python - Size: 2.93 KB - Last synced: 10 days ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0
deepjyotiroy079/bike-sharing-demand
Service that combines historical usage patterns with weather data to forecast the bicycle rental demand in real time.
Language: Jupyter Notebook - Size: 3.2 MB - Last synced: 10 days ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0
deepjyotiroy079/big-data-stack
Codes created while learning Big Data Stack.
Language: Jupyter Notebook - Size: 949 KB - Last synced: 10 days ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0
Firefly55lm/superconductors_critical_temperature_analysis
Academic project for Big Data Laboratory
Language: Jupyter Notebook - Size: 16.7 MB - Last synced: 10 days ago - Pushed: 10 days ago - Stars: 1 - Forks: 0
Betico1928/Talleres-ProcesamientoDeDatosAGranEscala
Exploración los principios del Procesamiento de Datos a Gran Escala con talleres de Databricks y Spark. Aprender herramientas como Pandas y PySpark para el análisis eficiente de grandes conjuntos de datos. Impartidos por John Corredor en la Pontificia Universidad Javeriana.
Language: Jupyter Notebook - Size: 203 MB - Last synced: 10 days ago - Pushed: 10 days ago - Stars: 0 - Forks: 0
MrPowers/mack
Delta Lake helper methods in PySpark
Language: Python - Size: 2.8 MB - Last synced: 10 days ago - Pushed: 3 months ago - Stars: 271 - Forks: 39
Anannya-M/Learning-Data-Engineering
Language: Jupyter Notebook - Size: 4.06 MB - Last synced: 11 days ago - Pushed: 11 days ago - Stars: 0 - Forks: 0
nabojyoti/ELT-IPL
This is an End-to-End Data Engineering Project that using the IPL Dataset.
Language: Jupyter Notebook - Size: 1.67 MB - Last synced: 11 days ago - Pushed: 11 days ago - Stars: 0 - Forks: 0
lamia-datalover/Big_Data
You can find in this repository the Big data's mini-projects .
Language: Jupyter Notebook - Size: 2.23 MB - Last synced: 11 days ago - Pushed: 11 days ago - Stars: 0 - Forks: 0
Majdi-Akrmi/ELT-IPL
This is an End-to-End Data Engineering Project that using the IPL Dataset.
Language: Jupyter Notebook - Size: 1.67 MB - Last synced: 10 days ago - Pushed: 11 days ago - Stars: 3 - Forks: 4
manoharpalanisamy/PySpark
Induction on PySpark by using Interactive shell prompt and jupyter-notebook
Language: Python - Size: 8.79 KB - Last synced: 12 days ago - Pushed: almost 6 years ago - Stars: 1 - Forks: 0
vladyslavyaloveha/etl_platform
🚖 ETL Platform: Analyzing NYC Yellow Taxi Trips with Airflow, FastAPI, and Cloud Integration
Language: Python - Size: 137 KB - Last synced: 2 days ago - Pushed: 12 days ago - Stars: 3 - Forks: 0
sudarshan-koirala/spark-practice
Learning spark the right way
Language: Jupyter Notebook - Size: 46.9 KB - Last synced: 12 days ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0
aakinlalu/Adhoc-Analysis-Queries
Combination of ad hoc queries with Python, Pyspark and SQL
Language: Jupyter Notebook - Size: 92.8 KB - Last synced: 12 days ago - Pushed: about 5 years ago - Stars: 0 - Forks: 0
easonlai/Samples_for_Azure_Databricks_Orientation
Samples for Azure Databricks Orientation
Language: HTML - Size: 6.78 MB - Last synced: 12 days ago - Pushed: over 3 years ago - Stars: 4 - Forks: 2