Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: pyspark

baaraban/sparkyShinyIndianaJones

Tracking flow of historical artifacts using open data sources

Language: Jupyter Notebook - Size: 18 MB - Last synced: about 5 hours ago - Pushed: over 4 years ago - Stars: 1 - Forks: 0

Tiago-B-C-Reis/Apache_Spark

Spark with Python, including Spark Streaming, Machine Learning, Spark DataFrames and more.

Language: Jupyter Notebook - Size: 44.2 MB - Last synced: about 6 hours ago - Pushed: about 7 hours ago - Stars: 0 - Forks: 0

groda/big_data

Tutorials on Big Data essentials: Hadoop, MapReduce, Spark.

Language: Jupyter Notebook - Size: 38.1 MB - Last synced: about 7 hours ago - Pushed: about 7 hours ago - Stars: 61 - Forks: 23

ibis-project/ibis

the portable Python dataframe library

Language: Python - Size: 72.3 MB - Last synced: about 7 hours ago - Pushed: about 12 hours ago - Stars: 4,261 - Forks: 530

mikan-senpai/sales-analysis

Python , PySpark , Big-Data

Language: Jupyter Notebook - Size: 4.23 MB - Last synced: about 12 hours ago - Pushed: about 12 hours ago - Stars: 0 - Forks: 0

nguyen-tho/DataAnalytics

Data analytics lecture using python and pyspark

Language: Jupyter Notebook - Size: 48.8 MB - Last synced: about 11 hours ago - Pushed: about 12 hours ago - Stars: 1 - Forks: 0

microsoft/SynapseML

Simple and Distributed Machine Learning

Language: Scala - Size: 139 MB - Last synced: about 6 hours ago - Pushed: 6 days ago - Stars: 4,975 - Forks: 815

pdemeulenaer/my-ds-documentation

Personal repo containing tips and tricks I have gone through so far, majorly on python, pyspark, scikit-learn and some other stuff

Language: Makefile - Size: 41.1 MB - Last synced: about 17 hours ago - Pushed: 1 day ago - Stars: 2 - Forks: 1

javiizz/SparkProjects-Healthcare_Analysis

Language: Jupyter Notebook - Size: 12.8 MB - Last synced: about 4 hours ago - Pushed: about 22 hours ago - Stars: 1 - Forks: 0

javiizz/SparkProjects-EarthQuake_Analysis

Earthquake Analysis using PySpark

Language: Jupyter Notebook - Size: 6.63 MB - Last synced: about 4 hours ago - Pushed: about 23 hours ago - Stars: 1 - Forks: 0

rajatkrishna/nlp-benchspark

Benchmark inference of custom and pre-trained NLP models with Spark NLP.

Language: Python - Size: 16.6 KB - Last synced: about 23 hours ago - Pushed: about 23 hours ago - Stars: 1 - Forks: 0

Digital-Defiance/nlp-metaformer

An ablation study on the transformer network for Natural Language Processing

Language: Rust - Size: 25.3 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 3 - Forks: 0

uber/petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Language: Python - Size: 2.69 MB - Last synced: about 16 hours ago - Pushed: 5 months ago - Stars: 1,754 - Forks: 281

josephmachado/efficient_data_processing_spark

Code for "Efficient Data Processing in Spark" Course

Language: Python - Size: 23.8 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 20 - Forks: 5

javiizz/PySparkDataFrame-QueryExplorations

Language: Jupyter Notebook - Size: 8.79 KB - Last synced: about 4 hours ago - Pushed: 23 days ago - Stars: 0 - Forks: 0

rickyschools/dltflow

A library for authoring DLT pipelines via meta-programming patterns and deploying to Databricks workspaces.

Language: Python - Size: 1.68 MB - Last synced: about 19 hours ago - Pushed: 1 day ago - Stars: 0 - Forks: 0

KevinShindel/MachineLearning

Pandas, Sci-kit, SparkML

Language: Jupyter Notebook - Size: 42.4 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 0 - Forks: 0

databrickslabs/dbldatagen

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines

Language: Python - Size: 10 MB - Last synced: about 1 hour ago - Pushed: about 2 hours ago - Stars: 267 - Forks: 51

GoogleCloudPlatform/dataproc-templates

Dataproc templates and pipelines for solving simple in-cloud data tasks

Language: Python - Size: 18.6 MB - Last synced: 1 day ago - Pushed: 3 days ago - Stars: 111 - Forks: 84

Tytrox/f1-driver-ranking

Ranks f1 drivers by comparing their historical performance in the same machinery

Language: Python - Size: 5.92 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 1 - Forks: 0

mitchelllisle/sparkdantic

✨ A Pydantic to PySpark schema library

Language: Python - Size: 2.05 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 29 - Forks: 6

apache/linkis

Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.

Language: Java - Size: 86 MB - Last synced: 1 day ago - Pushed: 12 days ago - Stars: 3,235 - Forks: 1,132

thejungwon/dat500-19-sample

UiS DAT500 sample code

Language: Jupyter Notebook - Size: 631 KB - Last synced: 2 days ago - Pushed: about 5 years ago - Stars: 0 - Forks: 0

kryvokhyzha/spark-code-examples

This repository contains some examples of using spark and pyspark

Language: Jupyter Notebook - Size: 38.2 MB - Last synced: 2 days ago - Pushed: over 2 years ago - Stars: 2 - Forks: 2

kryvokhyzha/databricks-data-engineer

This repository contains materials and resources for the Databricks Data Engineer course

Language: Jupyter Notebook - Size: 9.12 MB - Last synced: 2 days ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

ev2900/Glue_Aggregate_Small_Files

PySpark script to aggregate small parquet files in a prefix into larger files. Designed to be run on AWS Glue

Language: Python - Size: 116 KB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 0 - Forks: 0

ev2900/Glue_Examples

PySpark code samples designed for AWS Glue

Language: Python - Size: 34.2 KB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 1 - Forks: 0

awesome-spark/awesome-spark

A curated list of awesome Apache Spark packages and resources.

Language: Shell - Size: 209 KB - Last synced: about 22 hours ago - Pushed: about 1 month ago - Stars: 1,620 - Forks: 323

astrolabsoftware/spark-kernel-nersc

Create custom kernels for using pyspark notebooks at NERSC

Language: Python - Size: 1.27 MB - Last synced: 2 days ago - Pushed: over 5 years ago - Stars: 2 - Forks: 2

mathewsrc/machine-learning-monitoring-with-evidently

ML Monitoring with EvidentlyAI

Language: Jupyter Notebook - Size: 23.1 MB - Last synced: 2 days ago - Pushed: 4 months ago - Stars: 0 - Forks: 0

commoncrawl/cc-pyspark

Process Common Crawl data with Python and Spark

Language: Python - Size: 127 KB - Last synced: 2 days ago - Pushed: about 1 month ago - Stars: 379 - Forks: 84

Spindle-Health/carduus

PySpark implementation of the Open Privacy Preserving Record Linkage (OPPRL) specification.

Language: Python - Size: 1.74 MB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 2 - Forks: 0

jademene/International-Students-in-the-US-Analysis

Data Analysis and Visualisation | Databricks

Language: Jupyter Notebook - Size: 1.24 MB - Last synced: 1 day ago - Pushed: 2 days ago - Stars: 0 - Forks: 0

YeonwooSung/DevOpsMisc

Miscellaneous codes and writings for DevOps

Language: Jupyter Notebook - Size: 1.96 MB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 1 - Forks: 0

jademene/Titanic-Survival-Classification-Model Fork of JacopoBulgarelli/Cloud-Cognitive-Services

Machine Learning | Classification task

Language: Jupyter Notebook - Size: 1.48 MB - Last synced: 1 day ago - Pushed: 2 days ago - Stars: 0 - Forks: 0

MrPowers/quinn

pyspark methods to enhance developer productivity 📣 👯 🎉

Language: Python - Size: 1.94 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 580 - Forks: 91

JohnSnowLabs/spark-nlp

State of the Art Natural Language Processing

Language: Scala - Size: 1.47 GB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 3,701 - Forks: 701

jmcurbelo/pyspark-ingenieria-de-datos

Este repositorio contiene el material del curso de Udemy Big Data y Spark: ingeniería de datos con Python y pyspark. En este curso, aprenderás a utilizar las herramientas y técnicas necesarias para trabajar con grandes conjuntos de datos utilizando la librería pyspark.

Language: Python - Size: 27.3 KB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 17 - Forks: 33

capitalone/datacompy

Pandas and Spark DataFrame comparison for humans and more!

Language: Python - Size: 9.11 MB - Last synced: about 7 hours ago - Pushed: about 11 hours ago - Stars: 394 - Forks: 122

HariSekhon/DevOps-Python-tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Language: Python - Size: 3 MB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 731 - Forks: 334

F-Mangini/Data-Driven-Marketing

The objective is to develop a data-driven marketing strategy for the upcoming year to maximize product sales. This involves analysing existing data to identify patterns and trends, thereby informing a plan that aligns with and anticipates market dynamics to enhance sales performance.

Language: Jupyter Notebook - Size: 6.34 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 0 - Forks: 0

ArwaEiad/TMDB-Project

This project focuses on analyzing movie data using Pyspark tailored for efficient data processing on Hadoop Distributed File System (HDFS)

Language: Jupyter Notebook - Size: 9.77 KB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 0 - Forks: 0

hadarsharon/compars

DataFrame comparison done right, powered by Rust with polars (AKA the bear-agnostic 🐻 🐼 🐨 🐻‍❄️ DataFrame comparison library)

Language: Python - Size: 36.1 KB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 0 - Forks: 0

josephmachado/docker_for_data_engineers

Code for blog at: https://www.startdataengineering.com/post/docker-for-de/

Language: C - Size: 561 KB - Last synced: 2 days ago - Pushed: 10 days ago - Stars: 15 - Forks: 7

aksh-patel1/Big-Data-Processing_Parallelize-K-means

Implemented the parallelized version of k-means clustering algorithm in Spark and assess its efficiency using a real-world dataset.

Language: Jupyter Notebook - Size: 4.37 MB - Last synced: 4 days ago - Pushed: 5 months ago - Stars: 1 - Forks: 0

alejandronotario/LDA-Topic-Modeling

Language: Java - Size: 9.57 MB - Last synced: 4 days ago - Pushed: over 5 years ago - Stars: 6 - Forks: 3

prakash-aryan/MicroDataWarehouse

MicroDataWarehouse is an ETL pipeline built with Python, PySpark, and SQLite to extract, transform, and load data, with Metabase for data exploration and visualization.

Language: Python - Size: 234 KB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 0 - Forks: 0

canimus/cuallee

Possibly the fastest DataFrame-agnostic quality check library in town.

Language: Python - Size: 1.71 MB - Last synced: 2 days ago - Pushed: 3 days ago - Stars: 109 - Forks: 12

sarathchandrikak/Data-Projects

Collection of data analysis and data engineering projects

Language: Jupyter Notebook - Size: 2.63 MB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 2 - Forks: 1

kevinndungu-source/Amazon_EMR_Demonstration_Resources

Reposits the resources used in this project: EMR on EC2 Cluster.

Language: Jupyter Notebook - Size: 557 KB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 0 - Forks: 0

josephmachado/data_engineering_best_practices

Sample project to demonstrate data engineering best practices

Language: Python - Size: 644 KB - Last synced: 2 days ago - Pushed: 3 months ago - Stars: 132 - Forks: 17

noobpk/gemini-web-vulnerability-detection

Gemini-Web Vulnerability Detection (G-WVD) detecting web application vulnerabilities with deep learning

Language: Python - Size: 50.8 KB - Last synced: 6 days ago - Pushed: 6 days ago - Stars: 5 - Forks: 0

astrolabsoftware/fink-science

Define your science modules to add values to Fink alerts!

Language: Jupyter Notebook - Size: 660 MB - Last synced: 2 days ago - Pushed: 3 days ago - Stars: 10 - Forks: 14

astrolabsoftware/fink-filters

Define your filters to create your alert stream in Fink!

Language: Python - Size: 38 MB - Last synced: 2 days ago - Pushed: 6 days ago - Stars: 1 - Forks: 5

arunp77/Job-Market-project

InsightfulRecruit: Unveiling the Job Market Landscape through Data Engineering

Language: HTML - Size: 6.4 MB - Last synced: 6 days ago - Pushed: 7 days ago - Stars: 1 - Forks: 0

henriqueoelze/MapReduce-trab-puc

Trabalho feito no curso de pós gradução para a disciplina de Processamento Distribuido - PUC MG

Language: Python - Size: 5.86 KB - Last synced: 7 days ago - Pushed: almost 7 years ago - Stars: 0 - Forks: 0

guidok91/spark-movies-etl

Spark data pipeline that processes movie ratings data.

Language: Python - Size: 2.54 MB - Last synced: 6 days ago - Pushed: 7 days ago - Stars: 25 - Forks: 12

mohankrishna02/interview-scenerios-spark-sql

This repository focuses on providing interview scenario questions that I have encountered during interviews. The questions are designed to simulate real-world scenarios and test your problem-solving and technical skills. By exploring these scenarios, you can gain insights into common interview topics and prepare yourself for similar challenges.

Language: Scala - Size: 249 KB - Last synced: 6 days ago - Pushed: 7 days ago - Stars: 0 - Forks: 2

riolaf05/spark-elasticsearch-recommendation

Recommendation system using Alternating Least Squares(ALS) and Cosine Similarity on PySpark and Elasticsearch

Language: Jupyter Notebook - Size: 8.79 KB - Last synced: 7 days ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

CodeWithKriz/data-engineering

repository to maintain big data tutorials

Language: Python - Size: 16.6 KB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 0 - Forks: 0

pregismond/data-analysis-using-spark

Final Project Submission: Data Analysis using Spark

Language: Jupyter Notebook - Size: 20.5 KB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 0 - Forks: 0

jamestiotio/dbsys

SUTD 2021 50.043 Database and Big Data Systems Code Dump

Language: Java - Size: 69.7 MB - Last synced: 8 days ago - Pushed: almost 2 years ago - Stars: 1 - Forks: 3

CrossNox/7506-OD2

Recursos para 7506 (FIUBA)

Language: HTML - Size: 13.8 MB - Last synced: 8 days ago - Pushed: about 2 years ago - Stars: 7 - Forks: 8

FranzDiebold/advent-of-code-2021 📦

Solutions for Advent of Code 2021 in (Py)Spark

Language: Jupyter Notebook - Size: 22.5 KB - Last synced: 8 days ago - Pushed: over 2 years ago - Stars: 1 - Forks: 0

bhattbhavesh91/pyspark-basic-tutorial

A small walk through on how we can use PySpark with Google Colab

Language: Jupyter Notebook - Size: 22.5 KB - Last synced: 8 days ago - Pushed: over 4 years ago - Stars: 8 - Forks: 10

Ashutosh27ind/pySparkAirlinesDataAnalysis

PySpark Data Analysis for airlines dataset for files hosted on HDFX=S.

Language: Jupyter Notebook - Size: 2.82 MB - Last synced: 8 days ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 0

Ashutosh27ind/pySparkNYCParkingTickets

Attempt to scientifically analyze the phenomenon of increased traffic violation tickets issued by the NYC Police Department.

Language: Jupyter Notebook - Size: 11.7 KB - Last synced: 8 days ago - Pushed: about 4 years ago - Stars: 0 - Forks: 0

Ashutosh27ind/pySparkMLAnalysis

PySpark ML Heart and Advertisement Data Analysis

Language: Jupyter Notebook - Size: 8.79 KB - Last synced: 8 days ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 0

chezou/mecab-on-pyspark 📦

Example code for distributing Python packages on Spark cluster

Language: Python - Size: 6.84 KB - Last synced: 8 days ago - Pushed: almost 7 years ago - Stars: 3 - Forks: 2

chezou/cdsw-serve-docker 📦

REST API server example with Docker for Cloudera Data Science Workbench

Size: 2.93 KB - Last synced: 8 days ago - Pushed: over 6 years ago - Stars: 5 - Forks: 2

naiborhujosua/Machine-Learning-with-pyspark

This Note is a repository about my journey learning pyspark and the implementation in Machine Learning

Size: 10.5 MB - Last synced: 8 days ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0

UMassCDS/IHOP-Reddit

The Center for Data Science repository for the International Hate Observatory Project and analyzing Reddit. This produces the models used in RedditMap.social.

Language: Jupyter Notebook - Size: 113 MB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 1 - Forks: 1

suryadev99/pyspark_bank_data_pipeline

end-to-end deep learning pipeline that runs on Spark

Language: Jupyter Notebook - Size: 195 KB - Last synced: 7 days ago - Pushed: 8 days ago - Stars: 0 - Forks: 0

raineydavid/big-data-processing

Big Data Processing Notes from Masters in Big Data Science

Size: 13.7 MB - Last synced: 8 days ago - Pushed: about 5 years ago - Stars: 0 - Forks: 0

archivesunleashed/aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Language: Scala - Size: 39.5 MB - Last synced: 2 days ago - Pushed: 2 months ago - Stars: 133 - Forks: 33

rupeshtiwari/SparkStreamingInPython

spark python

Language: Python - Size: 218 KB - Last synced: 8 days ago - Pushed: about 2 years ago - Stars: 1 - Forks: 0

awesome-spark/spark-gotchas 📦

Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks

Size: 188 KB - Last synced: 3 days ago - Pushed: almost 7 years ago - Stars: 355 - Forks: 82

saikumarsuvanam/BigData

Hadoop,MachineLearningAlgos,Spark,Pig,Hive

Language: Java - Size: 4.37 MB - Last synced: 9 days ago - Pushed: over 6 years ago - Stars: 0 - Forks: 1

NHSDigital/mps_diagnostics

Interpretable metadata for the results of NHS England record linkage

Language: Python - Size: 537 KB - Last synced: 8 days ago - Pushed: 9 days ago - Stars: 1 - Forks: 0

G-Research/spark-extension

A library that provides useful extensions to Apache Spark and PySpark.

Language: Scala - Size: 796 KB - Last synced: 8 days ago - Pushed: 13 days ago - Stars: 172 - Forks: 28

kevinschaich/pyspark-cheatsheet

🐍 Quick reference guide to common patterns & functions in PySpark.

Size: 49.8 KB - Last synced: 8 days ago - Pushed: about 1 year ago - Stars: 343 - Forks: 115

Shivabajelan/Home_Sales

This project analyses home sales data using PySpark SQL. It involves creating a temporary table, running queries, and performing caching and partitioning. The final step involves uncaching and verifying the temporary table.

Language: Jupyter Notebook - Size: 38.1 KB - Last synced: 9 days ago - Pushed: 10 days ago - Stars: 0 - Forks: 0

OmarNouih/Twitter-Streams

Real-Time Sentiment Analysis on Twitter Streams is a web application that categorizes tweets into sentiments like Negative, Positive, Neutral, or Irrelevant. Built using Apache Kafka , Spark and PySpark ML models, it offers real-time analysis capabilities.

Language: Jupyter Notebook - Size: 3.3 MB - Last synced: 9 days ago - Pushed: 10 days ago - Stars: 0 - Forks: 0

supersjgk/Marketing_Campaign_Analysis

A Data Science project for Marketing Campaign Analysis

Language: Jupyter Notebook - Size: 2.14 MB - Last synced: 10 days ago - Pushed: 10 days ago - Stars: 0 - Forks: 0

kernel-loophole/distributed-computing

Size: 3.91 KB - Last synced: 10 days ago - Pushed: about 2 years ago - Stars: 0 - Forks: 0

deepjyotiroy079/twitter-streaming

Counting number of hash tags from live stream of tweets from twitter.

Language: Python - Size: 2.93 KB - Last synced: 10 days ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0

deepjyotiroy079/bike-sharing-demand

Service that combines historical usage patterns with weather data to forecast the bicycle rental demand in real time.

Language: Jupyter Notebook - Size: 3.2 MB - Last synced: 10 days ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0

deepjyotiroy079/big-data-stack

Codes created while learning Big Data Stack.

Language: Jupyter Notebook - Size: 949 KB - Last synced: 10 days ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0

Firefly55lm/superconductors_critical_temperature_analysis

Academic project for Big Data Laboratory

Language: Jupyter Notebook - Size: 16.7 MB - Last synced: 10 days ago - Pushed: 10 days ago - Stars: 1 - Forks: 0

Betico1928/Talleres-ProcesamientoDeDatosAGranEscala

Exploración los principios del Procesamiento de Datos a Gran Escala con talleres de Databricks y Spark. Aprender herramientas como Pandas y PySpark para el análisis eficiente de grandes conjuntos de datos. Impartidos por John Corredor en la Pontificia Universidad Javeriana.

Language: Jupyter Notebook - Size: 203 MB - Last synced: 10 days ago - Pushed: 10 days ago - Stars: 0 - Forks: 0

MrPowers/mack

Delta Lake helper methods in PySpark

Language: Python - Size: 2.8 MB - Last synced: 10 days ago - Pushed: 3 months ago - Stars: 271 - Forks: 39

Anannya-M/Learning-Data-Engineering

Language: Jupyter Notebook - Size: 4.06 MB - Last synced: 11 days ago - Pushed: 11 days ago - Stars: 0 - Forks: 0

nabojyoti/ELT-IPL

This is an End-to-End Data Engineering Project that using the IPL Dataset.

Language: Jupyter Notebook - Size: 1.67 MB - Last synced: 11 days ago - Pushed: 11 days ago - Stars: 0 - Forks: 0

lamia-datalover/Big_Data

You can find in this repository the Big data's mini-projects .

Language: Jupyter Notebook - Size: 2.23 MB - Last synced: 11 days ago - Pushed: 11 days ago - Stars: 0 - Forks: 0

Majdi-Akrmi/ELT-IPL

This is an End-to-End Data Engineering Project that using the IPL Dataset.

Language: Jupyter Notebook - Size: 1.67 MB - Last synced: 10 days ago - Pushed: 11 days ago - Stars: 3 - Forks: 4

manoharpalanisamy/PySpark

Induction on PySpark by using Interactive shell prompt and jupyter-notebook

Language: Python - Size: 8.79 KB - Last synced: 12 days ago - Pushed: almost 6 years ago - Stars: 1 - Forks: 0

vladyslavyaloveha/etl_platform

🚖 ETL Platform: Analyzing NYC Yellow Taxi Trips with Airflow, FastAPI, and Cloud Integration

Language: Python - Size: 137 KB - Last synced: 2 days ago - Pushed: 12 days ago - Stars: 3 - Forks: 0

sudarshan-koirala/spark-practice

Learning spark the right way

Language: Jupyter Notebook - Size: 46.9 KB - Last synced: 12 days ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0

aakinlalu/Adhoc-Analysis-Queries

Combination of ad hoc queries with Python, Pyspark and SQL

Language: Jupyter Notebook - Size: 92.8 KB - Last synced: 12 days ago - Pushed: about 5 years ago - Stars: 0 - Forks: 0

easonlai/Samples_for_Azure_Databricks_Orientation

Samples for Azure Databricks Orientation

Language: HTML - Size: 6.78 MB - Last synced: 12 days ago - Pushed: over 3 years ago - Stars: 4 - Forks: 2