Topic: "dask"
dask/dask
Parallel computing with task scheduling
Language: Python - Size: 122 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 13,441 - Forks: 1,794

rapidsai/cudf
cuDF - GPU DataFrame Library
Language: C++ - Size: 163 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 9,157 - Forks: 967

stumpy-dev/stumpy
STUMPY is a powerful and scalable Python library for modern time series analysis
Language: Python - Size: 128 MB - Last synced at: 6 days ago - Pushed at: about 2 months ago - Stars: 3,982 - Forks: 336

pydata/xarray
N-D labeled arrays and datasets in Python
Language: Python - Size: 48.2 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 3,946 - Forks: 1,169

mars-project/mars
Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
Language: Python - Size: 37 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 2,731 - Forks: 327

jmcarpenter2/swifter
A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner
Language: Python - Size: 2.15 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 2,611 - Forks: 104

fugue-project/fugue
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
Language: Python - Size: 5.98 MB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 2,105 - Forks: 94

dask/distributed
A distributed task scheduler for Dask
Language: Python - Size: 355 MB - Last synced at: about 18 hours ago - Pushed at: about 20 hours ago - Stars: 1,648 - Forks: 736

hi-primus/optimus
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Language: Python - Size: 110 MB - Last synced at: 3 days ago - Pushed at: 9 months ago - Stars: 1,517 - Forks: 233

narwhals-dev/narwhals
Lightweight and extensible compatibility layer between dataframe libraries!
Language: Python - Size: 12.9 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,278 - Forks: 161

itamarst/eliot
Eliot: the logging system that tells you *why* it happened
Language: Python - Size: 1.9 MB - Last synced at: 20 days ago - Pushed at: 6 months ago - Stars: 1,153 - Forks: 70

pytroll/satpy
Python package for earth-observing satellite data processing
Language: Python - Size: 15.5 MB - Last synced at: 6 days ago - Pushed at: 9 days ago - Stars: 1,131 - Forks: 313

Nixtla/mlforecast
Scalable machine 🤖 learning for time series forecasting.
Language: Python - Size: 29.7 MB - Last synced at: 10 days ago - Pushed at: 13 days ago - Stars: 1,057 - Forks: 102

capitalone/datacompy
Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!
Language: Python - Size: 12.2 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 596 - Forks: 146

ranaroussi/pystore
Fast data store for Pandas time-series data
Language: Python - Size: 1.02 MB - Last synced at: 7 days ago - Pushed at: about 2 months ago - Stars: 591 - Forks: 102

polyaxon/traceml
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
Language: Python - Size: 118 MB - Last synced at: 2 days ago - Pushed at: 3 months ago - Stars: 520 - Forks: 46

dask-contrib/dask-sql
Distributed SQL Engine in Python using Dask
Language: Python - Size: 3.35 MB - Last synced at: 20 days ago - Pushed at: about 1 year ago - Stars: 407 - Forks: 72

pytroll/pyresample
Geospatial image resampling in Python
Language: Python - Size: 16.8 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 370 - Forks: 95

Ouranosinc/xclim
Library of derived climate variables, ie climate indicators, based on xarray.
Language: Python - Size: 61.6 MB - Last synced at: about 17 hours ago - Pushed at: 5 days ago - Stars: 368 - Forks: 68

DataCanvasIO/HyperGBM
A full pipeline AutoML tool for tabular data
Language: Python - Size: 11 MB - Last synced at: 10 days ago - Pushed at: 5 months ago - Stars: 355 - Forks: 47

nebari-dev/nebari
🪴 Nebari - your open source data science platform
Language: Python - Size: 16.3 MB - Last synced at: 2 days ago - Pushed at: 6 days ago - Stars: 307 - Forks: 105

NVIDIA-Merlin/models
Merlin Models is a collection of deep learning recommender system model reference implementations
Language: Python - Size: 113 MB - Last synced at: 21 days ago - Pushed at: over 1 year ago - Stars: 286 - Forks: 53

JiaweiZhuang/xESMF
Universal Regridder for Geospatial Data
Language: Python - Size: 2.79 MB - Last synced at: 4 months ago - Pushed at: over 3 years ago - Stars: 278 - Forks: 48

aws-samples/amazon-sagemaker-local-mode
Amazon SageMaker Local Mode Examples
Language: Python - Size: 5.94 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 257 - Forks: 63

gjoseph92/stackstac
Turn a STAC catalog into a dask-based xarray
Language: Python - Size: 56.2 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 257 - Forks: 52

tkp-archive/paperboy
A web frontend for scheduling Jupyter notebook reports
Language: Python - Size: 12.5 MB - Last synced at: 2 months ago - Pushed at: 9 months ago - Stars: 253 - Forks: 25

LDO-CERT/orochi
The Volatility Collaborative GUI
Language: JavaScript - Size: 73.2 MB - Last synced at: 7 days ago - Pushed at: 9 days ago - Stars: 252 - Forks: 22

pangeo-data/climpred
:earth_americas: Verification of weather and climate forecasts :earth_africa:
Language: Python - Size: 58.2 MB - Last synced at: 7 days ago - Pushed at: 27 days ago - Stars: 249 - Forks: 48

dask/dask-jobqueue
Deploy Dask on job schedulers like PBS, SLURM, and SGE
Language: Python - Size: 755 KB - Last synced at: about 2 hours ago - Pushed at: 3 months ago - Stars: 249 - Forks: 146

AllenCellModeling/aicsimageio
Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Python
Language: Python - Size: 173 MB - Last synced at: 6 days ago - Pushed at: 8 months ago - Stars: 219 - Forks: 50

ESDS-Leipzig/cubo
On-Demand Earth System Data Cubes (ESDCs) in Python
Language: Python - Size: 1.63 MB - Last synced at: 2 days ago - Pushed at: 4 months ago - Stars: 194 - Forks: 14

jgrss/geowombat
GeoWombat: Utilities for geospatial data
Language: Jupyter Notebook - Size: 252 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 193 - Forks: 13

nci/scores
scores: Metrics for the verification, evaluation and optimisation of forecasts, predictions or models.
Language: Jupyter Notebook - Size: 17.1 MB - Last synced at: 6 days ago - Pushed at: 10 days ago - Stars: 183 - Forks: 35

msoechting/lexcube
Lexcube: 3D Data Cube Visualization in Jupyter Notebooks
Language: TypeScript - Size: 7.06 MB - Last synced at: 3 days ago - Pushed at: 3 months ago - Stars: 169 - Forks: 10

JDASoftwareGroup/kartothek
A consistent table management library in python
Language: Python - Size: 2.09 MB - Last synced at: 6 days ago - Pushed at: over 2 years ago - Stars: 160 - Forks: 53

jcmgray/autoray
Abstract your array operations.
Language: Python - Size: 1.86 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 155 - Forks: 11

ray-project/xgboost_ray
Distributed XGBoost on Ray
Language: Python - Size: 472 KB - Last synced at: 9 days ago - Pushed at: about 1 year ago - Stars: 149 - Forks: 36

google/xarray-beam
Distributed Xarray with Apache Beam
Language: Python - Size: 312 KB - Last synced at: about 7 hours ago - Pushed at: 2 days ago - Stars: 147 - Forks: 10

dask/dask-cloudprovider
Cloud provider cluster managers for Dask. Supports AWS, Google Cloud Azure and more...
Language: Python - Size: 829 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 144 - Forks: 117

hi-primus/bumblebee
🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)
Language: Vue - Size: 23 MB - Last synced at: 4 months ago - Pushed at: about 2 years ago - Stars: 141 - Forks: 35

TimeEval/TimeEval
Evaluation Tool for Anomaly Detection Algorithms on Time Series
Language: Jupyter Notebook - Size: 24.9 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 136 - Forks: 18

xarray-contrib/flox
Fast & furious GroupBy operations for dask.array
Language: Python - Size: 1.8 MB - Last synced at: 7 days ago - Pushed at: 20 days ago - Stars: 133 - Forks: 21

drshahizan/Python-big-data
Python and Pandas are known to have issues around scalability and efficiency. You will learn how to use libraries such as Modin, Dask, Ray, Vaex etc to overcome the problems faced by Pandas.
Language: Jupyter Notebook - Size: 107 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 126 - Forks: 67

xarray-contrib/xeofs
Comprehensive EOF analysis in Python with xarray: A versatile, multidimensional, and scalable tool for advanced climate data analysis
Language: Python - Size: 44.8 MB - Last synced at: 6 days ago - Pushed at: 7 months ago - Stars: 122 - Forks: 23

p2p-ld/numpydantic
Type annotations for specifying, validating, and serializing arrays with arbitrary backends in Pydantic (and beyond)
Language: Python - Size: 1.12 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 118 - Forks: 3

dask/dask-ec2 📦
Start a cluster in EC2 for dask.distributed
Language: Python - Size: 200 KB - Last synced at: about 2 months ago - Pushed at: almost 5 years ago - Stars: 106 - Forks: 37

geoxarray/geoxarray
Geolocation utilities for xarray
Language: Python - Size: 396 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 105 - Forks: 8

data-apis/array-api-compat
Compatibility layer for common array libraries to support the Array API
Language: Python - Size: 1.81 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 100 - Forks: 37

dymaxionlabs/dask-rasterio
Read and write rasters in parallel using Rasterio and Dask
Language: Python - Size: 813 KB - Last synced at: about 2 months ago - Pushed at: almost 5 years ago - Stars: 100 - Forks: 8

facultyai/lens
Summarise and explore Pandas DataFrames
Language: Python - Size: 229 KB - Last synced at: 13 days ago - Pushed at: about 5 years ago - Stars: 98 - Forks: 8

bioio-devs/bioio
Image reading, metadata management, and image writing for Microscopy images in Python
Language: Python - Size: 8.93 MB - Last synced at: 21 days ago - Pushed at: 25 days ago - Stars: 96 - Forks: 8

polyaxon/mloperator
Machine learning operator & controller for Kubernetes
Language: Go - Size: 2.11 MB - Last synced at: 19 days ago - Pushed at: 2 months ago - Stars: 92 - Forks: 8

miniufo/xgrads
Parse and read ctl and associated binary file commonly used by GrADS into xarray
Language: Jupyter Notebook - Size: 16.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 76 - Forks: 28

ds2-lab/Wukong
Wukong: A scalable and locality-enhanced serverless parallel framework (ACM SoCC'20)
Language: Python - Size: 14.6 MB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 75 - Forks: 16

superlinear-ai/graphchain
⚡️ An efficient cache for the execution of dask graphs.
Language: Python - Size: 294 KB - Last synced at: 13 days ago - Pushed at: almost 2 years ago - Stars: 71 - Forks: 14

treebeardtech/kubeflow-bootstrap
🪐 1-click Kubeflow using ArgoCD
Language: Shell - Size: 2.67 MB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 67 - Forks: 15

NCAR/ncar-python-tutorial 📦
Numerical & Scientific Computing with Python Tutorial
Language: Jupyter Notebook - Size: 49.4 MB - Last synced at: 7 months ago - Pushed at: over 5 years ago - Stars: 67 - Forks: 33

dask-contrib/dask-awkward
Native Dask collection for awkward arrays, and the library to use it.
Language: Python - Size: 1.56 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 66 - Forks: 20

chmp/framequery 📦
SQL on dataframes - pandas and dask
Language: Python - Size: 291 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 64 - Forks: 9

MITgcm/xmitgcm
Read MITgcm mds binary files into xarray
Language: Python - Size: 117 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 62 - Forks: 67

bytehub-ai/bytehub
ByteHub: making feature stores simple
Language: Python - Size: 363 KB - Last synced at: 10 days ago - Pushed at: over 4 years ago - Stars: 61 - Forks: 4

saturncloud/dask-pytorch-ddp
dask-pytorch-ddp is a Python package that makes it easy to train PyTorch models on dask clusters using distributed data parallel.
Language: Python - Size: 64.5 KB - Last synced at: about 9 hours ago - Pushed at: over 4 years ago - Stars: 59 - Forks: 9

dask-contrib/dask-deltatable
A Delta Lake reader for Dask
Language: Python - Size: 275 KB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 53 - Forks: 17

pnavaro/big-data
Python tools for big data
Language: Jupyter Notebook - Size: 167 MB - Last synced at: 4 months ago - Pushed at: almost 2 years ago - Stars: 53 - Forks: 30

backtick-se/cowait
Containerized distributed programming framework for Python
Language: Python - Size: 5.69 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 53 - Forks: 4

baggiponte/awesome-pandas-alternatives
Awesome list of alternative dataframe libraries in Python.
Size: 21.5 KB - Last synced at: about 22 hours ago - Pushed at: over 2 years ago - Stars: 53 - Forks: 2

JSybrandt/agatha
AGATHA: Automatic Graph-mining And Transformer based Hypothesis generation Approach
Language: Python - Size: 6.67 MB - Last synced at: almost 2 years ago - Pushed at: about 5 years ago - Stars: 52 - Forks: 9

dask/knit 📦
Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead
Language: Python - Size: 335 KB - Last synced at: 19 days ago - Pushed at: about 7 years ago - Stars: 52 - Forks: 10

shauryashaurya/learn-data-munging
Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.
Language: Jupyter Notebook - Size: 631 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 50 - Forks: 21

thewtex/ngff-zarr
A lean and kind Open Microscopy Environment (OME) Next Generation File Format (NGFF) Zarr implementation.
Language: Python - Size: 7.59 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 49 - Forks: 12

ml-tooling/lazycluster 📦
🎛 Distributed machine learning made simple.
Language: Python - Size: 809 KB - Last synced at: about 10 hours ago - Pushed at: over 2 years ago - Stars: 49 - Forks: 12

coiled/coiled-resources 📦
Notebooks that support blog posts and tech talks on Dask / Coiled.
Language: Jupyter Notebook - Size: 147 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 47 - Forks: 13

aertslab/arboreto
A scalable python-based framework for gene regulatory network inference using tree-based ensemble regressors.
Language: Jupyter Notebook - Size: 63.9 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 47 - Forks: 24

NCAR/cesm-lens-aws
Examples of analysis of CESM LENS data publicly available on Amazon S3 (us-west-2 region) using xarray and dask
Language: Jupyter Notebook - Size: 20.8 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 45 - Forks: 23

gjbex/Python-for-HPC
Repository for participants of the "Python for HPC" training
Language: Jupyter Notebook - Size: 7.87 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 43 - Forks: 18

iamtekson/geospatial-data-analysis-python
This repo contain the most common tools used in geospatial analysis using python!
Language: Jupyter Notebook - Size: 46.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 41 - Forks: 31

dgerlanc/dask-scaling-dataframe
Python and Dask: Scaling the Dataframe
Language: Jupyter Notebook - Size: 15.8 MB - Last synced at: 22 days ago - Pushed at: about 4 years ago - Stars: 40 - Forks: 20

lesommer/oocgcm
oocgcm is a python library for the analysis of large gridded geophysical dataset.
Language: Python - Size: 2.93 MB - Last synced at: 1 day ago - Pushed at: almost 8 years ago - Stars: 40 - Forks: 11

TGSAI/mdio-python
Cloud native, scalable storage engine for various types of energy data.
Language: Python - Size: 5.48 MB - Last synced at: 6 days ago - Pushed at: 12 days ago - Stars: 38 - Forks: 15

jrbourbeau/madpy-dask
MadPy Dask talk materials
Language: Jupyter Notebook - Size: 889 KB - Last synced at: 3 months ago - Pushed at: over 6 years ago - Stars: 33 - Forks: 4

MDAnalysis/pmda
Parallel algorithms for MDAnalysis
Language: Python - Size: 6.71 MB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 31 - Forks: 23

mpes-kit/mpes
Distributed data processing routines for multidimensional photoemission spectroscopy (MPES)
Language: Python - Size: 27.5 MB - Last synced at: 1 day ago - Pushed at: over 2 years ago - Stars: 31 - Forks: 6

OpenDataAnalytics/gaia
Gaia is a geospatial analysis library jointly developed by Kitware and Epidemico.
Language: Python - Size: 9.29 MB - Last synced at: 10 months ago - Pushed at: over 6 years ago - Stars: 31 - Forks: 15

basnijholt/adaptive-scheduler
Run many functions (adaptively) on many cores (>10k-100k) using mpi4py.futures, ipyparallel, loky, or dask-mpi. :tada:
Language: Python - Size: 1.23 MB - Last synced at: about 19 hours ago - Pushed at: 3 days ago - Stars: 30 - Forks: 12

dask-contrib/dask-snowflake
Dask integration for Snowflake
Language: Python - Size: 73.2 KB - Last synced at: 7 days ago - Pushed at: about 1 month ago - Stars: 30 - Forks: 10

umr-lops/xsar
Synthetic Aperture Radar (SAR) Level-1 GRD python mapper for efficient xarray/dask based processing
Language: Python - Size: 20.2 MB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 29 - Forks: 10

daskos/daskos
Apache Mesos backend for Dask scheduling library
Language: Python - Size: 82 KB - Last synced at: 6 days ago - Pushed at: almost 8 years ago - Stars: 28 - Forks: 5

Vizzuality/cog_worker
Scalable arbitrary analysis on COGs
Language: Jupyter Notebook - Size: 33.3 MB - Last synced at: 7 days ago - Pushed at: about 1 year ago - Stars: 27 - Forks: 1

msalvaris/DaskMaskRCNN
Running Mask-RCNN on Dask with PyTorch
Language: Python - Size: 36.1 KB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 26 - Forks: 3

dask-contrib/dask-histogram
Histograms with task scheduling.
Language: Python - Size: 498 KB - Last synced at: 8 days ago - Pushed at: 13 days ago - Stars: 24 - Forks: 5

itamarst/dask-memusage
A low-impact profiler to figure out how much memory each task in Dask is using
Language: Python - Size: 24.4 KB - Last synced at: 15 days ago - Pushed at: over 2 years ago - Stars: 24 - Forks: 1

NCAR/esmlab 📦
Earth System Model Lab (esmlab). ⚠️⚠️ ESMLab functionality has been moved into <https://github.com/NCAR/geocat-comp>. ⚠️⚠️
Language: Python - Size: 2.9 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 24 - Forks: 8

sinhrks/daskperiment
Reproducibility for Humans: A lightweight tool to perform reproducible machine learning experiment.
Language: Python - Size: 2.22 MB - Last synced at: 4 months ago - Pushed at: over 6 years ago - Stars: 24 - Forks: 5

makepath/austin-ml-change-detection-demo
A change detection demo for the Austin area using a pre-trained PyTorch model scaled with Dask on Planet imagery.
Language: Jupyter Notebook - Size: 160 MB - Last synced at: 5 months ago - Pushed at: almost 3 years ago - Stars: 23 - Forks: 3

PeterFogh/dvc_dask_use_case
A use case of a reproducible machine learning pipeline using Dask, DVC, and MLflow.
Language: Python - Size: 62.5 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 23 - Forks: 2

ratt-ru/codex-africanus
Radio Astronomy Algorithms Library
Language: Python - Size: 1.52 MB - Last synced at: 2 days ago - Pushed at: 9 days ago - Stars: 22 - Forks: 10

BlazingDB/Welcome_to_BlazingSQL_Notebooks
RAPIDS data science. No setup required.
Language: Jupyter Notebook - Size: 189 MB - Last synced at: 5 months ago - Pushed at: over 4 years ago - Stars: 20 - Forks: 13

ratt-ru/dask-ms
Implementation of a dask/xarray dataset backed by a CASA MS
Language: Python - Size: 6.77 MB - Last synced at: 15 days ago - Pushed at: 20 days ago - Stars: 19 - Forks: 8

CoffeaTeam/coffea-casa
Repository with configuration setup of a prototype of analysis facility - "coffea-casa"
Language: Python - Size: 11.4 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 18 - Forks: 20

pnnl/mercat
MerCat: python code for versatile k-mer counting and diversity estimation for database independent property analysis for meta -ome data
Language: Python - Size: 2.7 MB - Last synced at: 5 months ago - Pushed at: almost 3 years ago - Stars: 18 - Forks: 13
