Topic: "dask"
dask/dask
Parallel computing with task scheduling
Language: Python - Size: 120 MB - Last synced at: about 6 hours ago - Pushed at: about 11 hours ago - Stars: 13,190 - Forks: 1,764

rapidsai/cudf
cuDF - GPU DataFrame Library
Language: C++ - Size: 155 MB - Last synced at: about 13 hours ago - Pushed at: 1 day ago - Stars: 8,907 - Forks: 945

TDAmeritrade/stumpy
STUMPY is a powerful and scalable Python library for modern time series analysis
Language: Python - Size: 127 MB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 3,905 - Forks: 332

pydata/xarray
N-D labeled arrays and datasets in Python
Language: Python - Size: 47.2 MB - Last synced at: about 13 hours ago - Pushed at: 4 days ago - Stars: 3,789 - Forks: 1,137

mars-project/mars
Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
Language: Python - Size: 37 MB - Last synced at: 17 days ago - Pushed at: over 1 year ago - Stars: 2,722 - Forks: 327

jmcarpenter2/swifter
A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner
Language: Python - Size: 2.15 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 2,597 - Forks: 103

fugue-project/fugue
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
Language: Python - Size: 5.98 MB - Last synced at: about 7 hours ago - Pushed at: about 1 month ago - Stars: 2,079 - Forks: 94

dask/distributed
A distributed task scheduler for Dask
Language: Python - Size: 335 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1,627 - Forks: 731

hi-primus/optimus
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Language: Python - Size: 110 MB - Last synced at: 2 days ago - Pushed at: 5 months ago - Stars: 1,508 - Forks: 232

itamarst/eliot
Eliot: the logging system that tells you *why* it happened
Language: Python - Size: 1.9 MB - Last synced at: 4 days ago - Pushed at: 2 months ago - Stars: 1,143 - Forks: 71

pytroll/satpy
Python package for earth-observing satellite data processing
Language: Python - Size: 23 MB - Last synced at: about 1 hour ago - Pushed at: 6 days ago - Stars: 1,105 - Forks: 308

Nixtla/mlforecast
Scalable machine 🤖 learning for time series forecasting.
Language: Python - Size: 29.8 MB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 1,015 - Forks: 97

narwhals-dev/narwhals
Lightweight and extensible compatibility layer between dataframe libraries!
Language: Python - Size: 8.89 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 974 - Forks: 142

ranaroussi/pystore
Fast data store for Pandas time-series data
Language: Python - Size: 155 KB - Last synced at: 9 days ago - Pushed at: 10 months ago - Stars: 577 - Forks: 101

capitalone/datacompy
Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!
Language: Python - Size: 11.3 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 565 - Forks: 141

polyaxon/traceml
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
Language: Python - Size: 118 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 515 - Forks: 44

dask-contrib/dask-sql
Distributed SQL Engine in Python using Dask
Language: Python - Size: 3.35 MB - Last synced at: 1 day ago - Pushed at: 9 months ago - Stars: 404 - Forks: 72

pytroll/pyresample
Geospatial image resampling in Python
Language: Python - Size: 16.8 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 365 - Forks: 97

Ouranosinc/xclim
Library of derived climate variables, ie climate indicators, based on xarray.
Language: Python - Size: 60.7 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 354 - Forks: 65

DataCanvasIO/HyperGBM
A full pipeline AutoML tool for tabular data
Language: Python - Size: 11 MB - Last synced at: 3 days ago - Pushed at: 26 days ago - Stars: 347 - Forks: 47

nebari-dev/nebari
🪴 Nebari - your open source data science platform
Language: Python - Size: 16.2 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 291 - Forks: 100

JiaweiZhuang/xESMF
Universal Regridder for Geospatial Data
Language: Python - Size: 2.79 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 279 - Forks: 48

NVIDIA-Merlin/models
Merlin Models is a collection of deep learning recommender system model reference implementations
Language: Python - Size: 113 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 274 - Forks: 51

aws-samples/amazon-sagemaker-local-mode
Amazon SageMaker Local Mode Examples
Language: Python - Size: 5.94 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 255 - Forks: 61

gjoseph92/stackstac
Turn a STAC catalog into a dask-based xarray
Language: Python - Size: 56.2 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 254 - Forks: 52

tkp-archive/paperboy
A web frontend for scheduling Jupyter notebook reports
Language: Python - Size: 12.5 MB - Last synced at: 16 days ago - Pushed at: 5 months ago - Stars: 252 - Forks: 25

dask/dask-jobqueue
Deploy Dask on job schedulers like PBS, SLURM, and SGE
Language: Python - Size: 741 KB - Last synced at: about 13 hours ago - Pushed at: 25 days ago - Stars: 247 - Forks: 135

LDO-CERT/orochi
The Volatility Collaborative GUI
Language: JavaScript - Size: 73 MB - Last synced at: 7 days ago - Pushed at: 12 days ago - Stars: 242 - Forks: 21

pangeo-data/climpred
:earth_americas: Verification of weather and climate forecasts :earth_africa:
Language: Python - Size: 58.3 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 242 - Forks: 48

AllenCellModeling/aicsimageio
Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Python
Language: Python - Size: 173 MB - Last synced at: 6 days ago - Pushed at: 4 months ago - Stars: 213 - Forks: 50

jgrss/geowombat
GeoWombat: Utilities for geospatial data
Language: Jupyter Notebook - Size: 252 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 193 - Forks: 13

ESDS-Leipzig/cubo
On-Demand Earth System Data Cubes (ESDCs) in Python
Language: Python - Size: 1.62 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 184 - Forks: 14

nci/scores
scores: Metrics for the verification, evaluation and optimisation of forecasts, predictions or models.
Language: Jupyter Notebook - Size: 18.5 MB - Last synced at: 6 days ago - Pushed at: 12 days ago - Stars: 164 - Forks: 32

JDASoftwareGroup/kartothek
A consistent table management library in python
Language: Python - Size: 2.09 MB - Last synced at: 20 days ago - Pushed at: almost 2 years ago - Stars: 159 - Forks: 53

msoechting/lexcube
Lexcube: 3D Data Cube Visualization in Jupyter Notebooks
Language: TypeScript - Size: 6.72 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 148 - Forks: 8

jcmgray/autoray
Abstract your array operations.
Language: Python - Size: 1.74 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 148 - Forks: 11

ray-project/xgboost_ray
Distributed XGBoost on Ray
Language: Python - Size: 472 KB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 148 - Forks: 35

google/xarray-beam
Distributed Xarray with Apache Beam
Language: Python - Size: 289 KB - Last synced at: 4 days ago - Pushed at: 11 days ago - Stars: 147 - Forks: 8

dask/dask-cloudprovider
Cloud provider cluster managers for Dask. Supports AWS, Google Cloud Azure and more...
Language: Python - Size: 803 KB - Last synced at: 4 days ago - Pushed at: 4 months ago - Stars: 141 - Forks: 111

hi-primus/bumblebee
🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)
Language: Vue - Size: 23 MB - Last synced at: 11 days ago - Pushed at: almost 2 years ago - Stars: 141 - Forks: 35

xarray-contrib/flox
Fast & furious GroupBy operations for dask.array
Language: Python - Size: 1.83 MB - Last synced at: 7 days ago - Pushed at: 11 days ago - Stars: 130 - Forks: 18

drshahizan/Python-big-data
Python and Pandas are known to have issues around scalability and efficiency. You will learn how to use libraries such as Modin, Dask, Ray, Vaex etc to overcome the problems faced by Pandas.
Language: Jupyter Notebook - Size: 107 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 126 - Forks: 67

TimeEval/TimeEval
Evaluation Tool for Anomaly Detection Algorithms on Time Series
Language: Jupyter Notebook - Size: 24.7 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 123 - Forks: 18

xarray-contrib/xeofs
Comprehensive EOF analysis in Python with xarray: A versatile, multidimensional, and scalable tool for advanced climate data analysis
Language: Python - Size: 44.8 MB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 116 - Forks: 23

dask/dask-ec2 📦
Start a cluster in EC2 for dask.distributed
Language: Python - Size: 200 KB - Last synced at: 22 days ago - Pushed at: over 4 years ago - Stars: 106 - Forks: 37

geoxarray/geoxarray
Geolocation utilities for xarray
Language: Python - Size: 381 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 102 - Forks: 8

p2p-ld/numpydantic
Type annotations for specifying, validating, and serializing arrays with arbitrary backends in Pydantic (and beyond)
Language: Python - Size: 791 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 100 - Forks: 1

dymaxionlabs/dask-rasterio
Read and write rasters in parallel using Rasterio and Dask
Language: Python - Size: 813 KB - Last synced at: 13 days ago - Pushed at: over 4 years ago - Stars: 99 - Forks: 8

facultyai/lens
Summarise and explore Pandas DataFrames
Language: Python - Size: 229 KB - Last synced at: 1 day ago - Pushed at: almost 5 years ago - Stars: 98 - Forks: 8

data-apis/array-api-compat
Compatibility layer for common array libraries to support the Array API
Language: Python - Size: 1.47 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 96 - Forks: 33

polyaxon/mloperator
Machine learning operator & controller for Kubernetes
Language: Go - Size: 2.1 MB - Last synced at: 7 days ago - Pushed at: 21 days ago - Stars: 92 - Forks: 8

bioio-devs/bioio
Image reading, metadata management, and image writing for Microscopy images in Python
Language: Python - Size: 8.53 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 86 - Forks: 7

miniufo/xgrads
Parse and read ctl and associated binary file commonly used by GrADS into xarray
Language: Jupyter Notebook - Size: 16.6 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 76 - Forks: 27

superlinear-ai/graphchain
⚡️ An efficient cache for the execution of dask graphs.
Language: Python - Size: 294 KB - Last synced at: 15 days ago - Pushed at: over 1 year ago - Stars: 71 - Forks: 14

treebeardtech/kubeflow-bootstrap
🪐 1-click Kubeflow using ArgoCD
Language: Shell - Size: 2.67 MB - Last synced at: 5 days ago - Pushed at: 9 months ago - Stars: 67 - Forks: 15

NCAR/ncar-python-tutorial 📦
Numerical & Scientific Computing with Python Tutorial
Language: Jupyter Notebook - Size: 49.4 MB - Last synced at: 4 months ago - Pushed at: about 5 years ago - Stars: 67 - Forks: 33

dask-contrib/dask-awkward
Native Dask collection for awkward arrays, and the library to use it.
Language: Python - Size: 1.63 MB - Last synced at: about 3 hours ago - Pushed at: about 5 hours ago - Stars: 64 - Forks: 19

chmp/framequery 📦
SQL on dataframes - pandas and dask
Language: Python - Size: 291 KB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 64 - Forks: 9

bytehub-ai/bytehub
ByteHub: making feature stores simple
Language: Python - Size: 363 KB - Last synced at: 1 day ago - Pushed at: almost 4 years ago - Stars: 60 - Forks: 4

saturncloud/dask-pytorch-ddp
dask-pytorch-ddp is a Python package that makes it easy to train PyTorch models on dask clusters using distributed data parallel.
Language: Python - Size: 64.5 KB - Last synced at: 10 days ago - Pushed at: about 4 years ago - Stars: 59 - Forks: 9

MITgcm/xmitgcm
Read MITgcm mds binary files into xarray
Language: Python - Size: 117 MB - Last synced at: 23 days ago - Pushed at: over 1 year ago - Stars: 58 - Forks: 67

pnavaro/big-data
Python tools for big data
Language: Jupyter Notebook - Size: 167 MB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 53 - Forks: 30

backtick-se/cowait
Containerized distributed programming framework for Python
Language: Python - Size: 5.69 MB - Last synced at: 13 days ago - Pushed at: about 2 years ago - Stars: 53 - Forks: 4

JSybrandt/agatha
AGATHA: Automatic Graph-mining And Transformer based Hypothesis generation Approach
Language: Python - Size: 6.67 MB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 52 - Forks: 9

dask/knit 📦
Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead
Language: Python - Size: 335 KB - Last synced at: 20 days ago - Pushed at: almost 7 years ago - Stars: 52 - Forks: 10

dask-contrib/dask-deltatable
A Delta Lake reader for Dask
Language: Python - Size: 260 KB - Last synced at: 6 days ago - Pushed at: 7 months ago - Stars: 49 - Forks: 15

ml-tooling/lazycluster 📦
🎛 Distributed machine learning made simple.
Language: Python - Size: 809 KB - Last synced at: 21 days ago - Pushed at: about 2 years ago - Stars: 49 - Forks: 12

shauryashaurya/learn-data-munging
Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.
Language: Jupyter Notebook - Size: 627 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 48 - Forks: 21

baggiponte/awesome-pandas-alternatives
Awesome list of alternative dataframe libraries in Python.
Size: 21.5 KB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 48 - Forks: 2

coiled/coiled-resources 📦
Notebooks that support blog posts and tech talks on Dask / Coiled.
Language: Jupyter Notebook - Size: 147 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 47 - Forks: 13

aertslab/arboreto
A scalable python-based framework for gene regulatory network inference using tree-based ensemble regressors.
Language: Jupyter Notebook - Size: 63.9 MB - Last synced at: 11 months ago - Pushed at: about 1 year ago - Stars: 47 - Forks: 24

NCAR/cesm-lens-aws
Examples of analysis of CESM LENS data publicly available on Amazon S3 (us-west-2 region) using xarray and dask
Language: Jupyter Notebook - Size: 20.8 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 45 - Forks: 23

gjbex/Python-for-HPC
Repository for participants of the "Python for HPC" training
Language: Jupyter Notebook - Size: 7.72 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 42 - Forks: 18

dgerlanc/dask-scaling-dataframe
Python and Dask: Scaling the Dataframe
Language: Jupyter Notebook - Size: 15.8 MB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 40 - Forks: 20

lesommer/oocgcm
oocgcm is a python library for the analysis of large gridded geophysical dataset.
Language: Python - Size: 2.93 MB - Last synced at: 30 days ago - Pushed at: over 7 years ago - Stars: 39 - Forks: 11

TGSAI/mdio-python
Cloud native, scalable storage engine for various types of energy data.
Language: Python - Size: 6.19 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 38 - Forks: 14

thewtex/ngff-zarr
A lean and kind Open Microscopy Environment (OME) Next Generation File Format (NGFF) Zarr implementation.
Language: Python - Size: 688 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 35 - Forks: 9

jrbourbeau/madpy-dask
MadPy Dask talk materials
Language: Jupyter Notebook - Size: 889 KB - Last synced at: 12 days ago - Pushed at: over 6 years ago - Stars: 33 - Forks: 4

iamtekson/geospatial-data-analysis-python
This repo contain the most common tools used in geospatial analysis using python!
Language: Jupyter Notebook - Size: 46.2 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 32 - Forks: 23

MDAnalysis/pmda
Parallel algorithms for MDAnalysis
Language: Python - Size: 6.71 MB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 31 - Forks: 23

OpenDataAnalytics/gaia
Gaia is a geospatial analysis library jointly developed by Kitware and Epidemico.
Language: Python - Size: 9.29 MB - Last synced at: 6 months ago - Pushed at: about 6 years ago - Stars: 31 - Forks: 15

basnijholt/adaptive-scheduler
Run many functions (adaptively) on many cores (>10k-100k) using mpi4py.futures, ipyparallel, loky, or dask-mpi. :tada:
Language: Python - Size: 935 KB - Last synced at: about 20 hours ago - Pushed at: 7 days ago - Stars: 30 - Forks: 11

umr-lops/xsar
Synthetic Aperture Radar (SAR) Level-1 GRD python mapper for efficient xarray/dask based processing
Language: Python - Size: 20.2 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 30 - Forks: 9

coiled/dask-snowflake
Dask integration for Snowflake
Language: Python - Size: 64.5 KB - Last synced at: 26 days ago - Pushed at: 6 months ago - Stars: 30 - Forks: 9

mpes-kit/mpes
Distributed data processing routines for multidimensional photoemission spectroscopy (MPES)
Language: Python - Size: 27.5 MB - Last synced at: 4 days ago - Pushed at: over 2 years ago - Stars: 30 - Forks: 6

daskos/daskos
Apache Mesos backend for Dask scheduling library
Language: Python - Size: 82 KB - Last synced at: 20 days ago - Pushed at: over 7 years ago - Stars: 28 - Forks: 5

Vizzuality/cog_worker
Scalable arbitrary analysis on COGs
Language: Jupyter Notebook - Size: 33.3 MB - Last synced at: 24 days ago - Pushed at: 10 months ago - Stars: 27 - Forks: 1

msalvaris/DaskMaskRCNN
Running Mask-RCNN on Dask with PyTorch
Language: Python - Size: 36.1 KB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 26 - Forks: 3

dask-contrib/dask-histogram
Histograms with task scheduling.
Language: Python - Size: 526 KB - Last synced at: about 3 hours ago - Pushed at: about 5 hours ago - Stars: 24 - Forks: 6

itamarst/dask-memusage
A low-impact profiler to figure out how much memory each task in Dask is using
Language: Python - Size: 24.4 KB - Last synced at: 23 days ago - Pushed at: about 2 years ago - Stars: 24 - Forks: 1

NCAR/esmlab 📦
Earth System Model Lab (esmlab). ⚠️⚠️ ESMLab functionality has been moved into <https://github.com/NCAR/geocat-comp>. ⚠️⚠️
Language: Python - Size: 2.9 MB - Last synced at: about 2 months ago - Pushed at: about 4 years ago - Stars: 24 - Forks: 8

sinhrks/daskperiment
Reproducibility for Humans: A lightweight tool to perform reproducible machine learning experiment.
Language: Python - Size: 2.22 MB - Last synced at: 4 days ago - Pushed at: about 6 years ago - Stars: 24 - Forks: 5

makepath/austin-ml-change-detection-demo
A change detection demo for the Austin area using a pre-trained PyTorch model scaled with Dask on Planet imagery.
Language: Jupyter Notebook - Size: 160 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 23 - Forks: 3

PeterFogh/dvc_dask_use_case
A use case of a reproducible machine learning pipeline using Dask, DVC, and MLflow.
Language: Python - Size: 62.5 KB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 23 - Forks: 2

ratt-ru/codex-africanus
Radio Astronomy Algorithms Library
Language: Python - Size: 1.52 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 20 - Forks: 10

BlazingDB/Welcome_to_BlazingSQL_Notebooks
RAPIDS data science. No setup required.
Language: Jupyter Notebook - Size: 189 MB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 20 - Forks: 13

ratt-ru/dask-ms
Implementation of a dask/xarray dataset backed by a CASA MS
Language: Python - Size: 6.72 MB - Last synced at: 7 days ago - Pushed at: about 1 month ago - Stars: 19 - Forks: 7

pnnl/mercat
MerCat: python code for versatile k-mer counting and diversity estimation for database independent property analysis for meta -ome data
Language: Python - Size: 2.7 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 18 - Forks: 13

pangeo-data/pangeo-binder
Pangeo + Binder (dev repo for a binder/pangeo fusion concept)
Language: Python - Size: 770 KB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 18 - Forks: 13

dionresearch/stemgraphic
stemgraphic python package for visualization of data and text
Language: Jupyter Notebook - Size: 23.8 MB - Last synced at: 30 days ago - Pushed at: about 4 years ago - Stars: 18 - Forks: 1
