An open API service providing repository metadata for many open source software ecosystems.

Topic: "dask"

dask/dask

Parallel computing with task scheduling

Language: Python - Size: 120 MB - Last synced at: about 6 hours ago - Pushed at: about 11 hours ago - Stars: 13,190 - Forks: 1,764

rapidsai/cudf

cuDF - GPU DataFrame Library

Language: C++ - Size: 155 MB - Last synced at: about 13 hours ago - Pushed at: 1 day ago - Stars: 8,907 - Forks: 945

TDAmeritrade/stumpy

STUMPY is a powerful and scalable Python library for modern time series analysis

Language: Python - Size: 127 MB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 3,905 - Forks: 332

pydata/xarray

N-D labeled arrays and datasets in Python

Language: Python - Size: 47.2 MB - Last synced at: about 13 hours ago - Pushed at: 4 days ago - Stars: 3,789 - Forks: 1,137

mars-project/mars

Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.

Language: Python - Size: 37 MB - Last synced at: 17 days ago - Pushed at: over 1 year ago - Stars: 2,722 - Forks: 327

jmcarpenter2/swifter

A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner

Language: Python - Size: 2.15 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 2,597 - Forks: 103

fugue-project/fugue

A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

Language: Python - Size: 5.98 MB - Last synced at: about 7 hours ago - Pushed at: about 1 month ago - Stars: 2,079 - Forks: 94

dask/distributed

A distributed task scheduler for Dask

Language: Python - Size: 335 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1,627 - Forks: 731

hi-primus/optimus

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Language: Python - Size: 110 MB - Last synced at: 2 days ago - Pushed at: 5 months ago - Stars: 1,508 - Forks: 232

itamarst/eliot

Eliot: the logging system that tells you *why* it happened

Language: Python - Size: 1.9 MB - Last synced at: 4 days ago - Pushed at: 2 months ago - Stars: 1,143 - Forks: 71

pytroll/satpy

Python package for earth-observing satellite data processing

Language: Python - Size: 23 MB - Last synced at: about 1 hour ago - Pushed at: 6 days ago - Stars: 1,105 - Forks: 308

Nixtla/mlforecast

Scalable machine 🤖 learning for time series forecasting.

Language: Python - Size: 29.8 MB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 1,015 - Forks: 97

narwhals-dev/narwhals

Lightweight and extensible compatibility layer between dataframe libraries!

Language: Python - Size: 8.89 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 974 - Forks: 142

ranaroussi/pystore

Fast data store for Pandas time-series data

Language: Python - Size: 155 KB - Last synced at: 9 days ago - Pushed at: 10 months ago - Stars: 577 - Forks: 101

capitalone/datacompy

Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!

Language: Python - Size: 11.3 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 565 - Forks: 141

polyaxon/traceml

Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.

Language: Python - Size: 118 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 515 - Forks: 44

dask-contrib/dask-sql

Distributed SQL Engine in Python using Dask

Language: Python - Size: 3.35 MB - Last synced at: 1 day ago - Pushed at: 9 months ago - Stars: 404 - Forks: 72

pytroll/pyresample

Geospatial image resampling in Python

Language: Python - Size: 16.8 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 365 - Forks: 97

Ouranosinc/xclim

Library of derived climate variables, ie climate indicators, based on xarray.

Language: Python - Size: 60.7 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 354 - Forks: 65

DataCanvasIO/HyperGBM

A full pipeline AutoML tool for tabular data

Language: Python - Size: 11 MB - Last synced at: 3 days ago - Pushed at: 26 days ago - Stars: 347 - Forks: 47

nebari-dev/nebari

🪴 Nebari - your open source data science platform

Language: Python - Size: 16.2 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 291 - Forks: 100

JiaweiZhuang/xESMF

Universal Regridder for Geospatial Data

Language: Python - Size: 2.79 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 279 - Forks: 48

NVIDIA-Merlin/models

Merlin Models is a collection of deep learning recommender system model reference implementations

Language: Python - Size: 113 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 274 - Forks: 51

aws-samples/amazon-sagemaker-local-mode

Amazon SageMaker Local Mode Examples

Language: Python - Size: 5.94 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 255 - Forks: 61

gjoseph92/stackstac

Turn a STAC catalog into a dask-based xarray

Language: Python - Size: 56.2 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 254 - Forks: 52

tkp-archive/paperboy

A web frontend for scheduling Jupyter notebook reports

Language: Python - Size: 12.5 MB - Last synced at: 16 days ago - Pushed at: 5 months ago - Stars: 252 - Forks: 25

dask/dask-jobqueue

Deploy Dask on job schedulers like PBS, SLURM, and SGE

Language: Python - Size: 741 KB - Last synced at: about 13 hours ago - Pushed at: 25 days ago - Stars: 247 - Forks: 135

LDO-CERT/orochi

The Volatility Collaborative GUI

Language: JavaScript - Size: 73 MB - Last synced at: 7 days ago - Pushed at: 12 days ago - Stars: 242 - Forks: 21

pangeo-data/climpred

:earth_americas: Verification of weather and climate forecasts :earth_africa:

Language: Python - Size: 58.3 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 242 - Forks: 48

AllenCellModeling/aicsimageio

Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Python

Language: Python - Size: 173 MB - Last synced at: 6 days ago - Pushed at: 4 months ago - Stars: 213 - Forks: 50

jgrss/geowombat

GeoWombat: Utilities for geospatial data

Language: Jupyter Notebook - Size: 252 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 193 - Forks: 13

ESDS-Leipzig/cubo

On-Demand Earth System Data Cubes (ESDCs) in Python

Language: Python - Size: 1.62 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 184 - Forks: 14

nci/scores

scores: Metrics for the verification, evaluation and optimisation of forecasts, predictions or models.

Language: Jupyter Notebook - Size: 18.5 MB - Last synced at: 6 days ago - Pushed at: 12 days ago - Stars: 164 - Forks: 32

JDASoftwareGroup/kartothek

A consistent table management library in python

Language: Python - Size: 2.09 MB - Last synced at: 20 days ago - Pushed at: almost 2 years ago - Stars: 159 - Forks: 53

msoechting/lexcube

Lexcube: 3D Data Cube Visualization in Jupyter Notebooks

Language: TypeScript - Size: 6.72 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 148 - Forks: 8

jcmgray/autoray

Abstract your array operations.

Language: Python - Size: 1.74 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 148 - Forks: 11

ray-project/xgboost_ray

Distributed XGBoost on Ray

Language: Python - Size: 472 KB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 148 - Forks: 35

google/xarray-beam

Distributed Xarray with Apache Beam

Language: Python - Size: 289 KB - Last synced at: 4 days ago - Pushed at: 11 days ago - Stars: 147 - Forks: 8

dask/dask-cloudprovider

Cloud provider cluster managers for Dask. Supports AWS, Google Cloud Azure and more...

Language: Python - Size: 803 KB - Last synced at: 4 days ago - Pushed at: 4 months ago - Stars: 141 - Forks: 111

hi-primus/bumblebee

🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)

Language: Vue - Size: 23 MB - Last synced at: 11 days ago - Pushed at: almost 2 years ago - Stars: 141 - Forks: 35

xarray-contrib/flox

Fast & furious GroupBy operations for dask.array

Language: Python - Size: 1.83 MB - Last synced at: 7 days ago - Pushed at: 11 days ago - Stars: 130 - Forks: 18

drshahizan/Python-big-data

Python and Pandas are known to have issues around scalability and efficiency. You will learn how to use libraries such as Modin, Dask, Ray, Vaex etc to overcome the problems faced by Pandas.

Language: Jupyter Notebook - Size: 107 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 126 - Forks: 67

TimeEval/TimeEval

Evaluation Tool for Anomaly Detection Algorithms on Time Series

Language: Jupyter Notebook - Size: 24.7 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 123 - Forks: 18

xarray-contrib/xeofs

Comprehensive EOF analysis in Python with xarray: A versatile, multidimensional, and scalable tool for advanced climate data analysis

Language: Python - Size: 44.8 MB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 116 - Forks: 23

dask/dask-ec2 📦

Start a cluster in EC2 for dask.distributed

Language: Python - Size: 200 KB - Last synced at: 22 days ago - Pushed at: over 4 years ago - Stars: 106 - Forks: 37

geoxarray/geoxarray

Geolocation utilities for xarray

Language: Python - Size: 381 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 102 - Forks: 8

p2p-ld/numpydantic

Type annotations for specifying, validating, and serializing arrays with arbitrary backends in Pydantic (and beyond)

Language: Python - Size: 791 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 100 - Forks: 1

dymaxionlabs/dask-rasterio

Read and write rasters in parallel using Rasterio and Dask

Language: Python - Size: 813 KB - Last synced at: 13 days ago - Pushed at: over 4 years ago - Stars: 99 - Forks: 8

facultyai/lens

Summarise and explore Pandas DataFrames

Language: Python - Size: 229 KB - Last synced at: 1 day ago - Pushed at: almost 5 years ago - Stars: 98 - Forks: 8

data-apis/array-api-compat

Compatibility layer for common array libraries to support the Array API

Language: Python - Size: 1.47 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 96 - Forks: 33

polyaxon/mloperator

Machine learning operator & controller for Kubernetes

Language: Go - Size: 2.1 MB - Last synced at: 7 days ago - Pushed at: 21 days ago - Stars: 92 - Forks: 8

bioio-devs/bioio

Image reading, metadata management, and image writing for Microscopy images in Python

Language: Python - Size: 8.53 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 86 - Forks: 7

miniufo/xgrads

Parse and read ctl and associated binary file commonly used by GrADS into xarray

Language: Jupyter Notebook - Size: 16.6 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 76 - Forks: 27

superlinear-ai/graphchain

⚡️ An efficient cache for the execution of dask graphs.

Language: Python - Size: 294 KB - Last synced at: 15 days ago - Pushed at: over 1 year ago - Stars: 71 - Forks: 14

treebeardtech/kubeflow-bootstrap

🪐 1-click Kubeflow using ArgoCD

Language: Shell - Size: 2.67 MB - Last synced at: 5 days ago - Pushed at: 9 months ago - Stars: 67 - Forks: 15

NCAR/ncar-python-tutorial 📦

Numerical & Scientific Computing with Python Tutorial

Language: Jupyter Notebook - Size: 49.4 MB - Last synced at: 4 months ago - Pushed at: about 5 years ago - Stars: 67 - Forks: 33

dask-contrib/dask-awkward

Native Dask collection for awkward arrays, and the library to use it.

Language: Python - Size: 1.63 MB - Last synced at: about 3 hours ago - Pushed at: about 5 hours ago - Stars: 64 - Forks: 19

chmp/framequery 📦

SQL on dataframes - pandas and dask

Language: Python - Size: 291 KB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 64 - Forks: 9

bytehub-ai/bytehub

ByteHub: making feature stores simple

Language: Python - Size: 363 KB - Last synced at: 1 day ago - Pushed at: almost 4 years ago - Stars: 60 - Forks: 4

saturncloud/dask-pytorch-ddp

dask-pytorch-ddp is a Python package that makes it easy to train PyTorch models on dask clusters using distributed data parallel.

Language: Python - Size: 64.5 KB - Last synced at: 10 days ago - Pushed at: about 4 years ago - Stars: 59 - Forks: 9

MITgcm/xmitgcm

Read MITgcm mds binary files into xarray

Language: Python - Size: 117 MB - Last synced at: 23 days ago - Pushed at: over 1 year ago - Stars: 58 - Forks: 67

pnavaro/big-data

Python tools for big data

Language: Jupyter Notebook - Size: 167 MB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 53 - Forks: 30

backtick-se/cowait

Containerized distributed programming framework for Python

Language: Python - Size: 5.69 MB - Last synced at: 13 days ago - Pushed at: about 2 years ago - Stars: 53 - Forks: 4

JSybrandt/agatha

AGATHA: Automatic Graph-mining And Transformer based Hypothesis generation Approach

Language: Python - Size: 6.67 MB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 52 - Forks: 9

dask/knit 📦

Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead

Language: Python - Size: 335 KB - Last synced at: 20 days ago - Pushed at: almost 7 years ago - Stars: 52 - Forks: 10

dask-contrib/dask-deltatable

A Delta Lake reader for Dask

Language: Python - Size: 260 KB - Last synced at: 6 days ago - Pushed at: 7 months ago - Stars: 49 - Forks: 15

ml-tooling/lazycluster 📦

🎛 Distributed machine learning made simple.

Language: Python - Size: 809 KB - Last synced at: 21 days ago - Pushed at: about 2 years ago - Stars: 49 - Forks: 12

shauryashaurya/learn-data-munging

Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.

Language: Jupyter Notebook - Size: 627 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 48 - Forks: 21

baggiponte/awesome-pandas-alternatives

Awesome list of alternative dataframe libraries in Python.

Size: 21.5 KB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 48 - Forks: 2

coiled/coiled-resources 📦

Notebooks that support blog posts and tech talks on Dask / Coiled.

Language: Jupyter Notebook - Size: 147 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 47 - Forks: 13

aertslab/arboreto

A scalable python-based framework for gene regulatory network inference using tree-based ensemble regressors.

Language: Jupyter Notebook - Size: 63.9 MB - Last synced at: 11 months ago - Pushed at: about 1 year ago - Stars: 47 - Forks: 24

NCAR/cesm-lens-aws

Examples of analysis of CESM LENS data publicly available on Amazon S3 (us-west-2 region) using xarray and dask

Language: Jupyter Notebook - Size: 20.8 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 45 - Forks: 23

gjbex/Python-for-HPC

Repository for participants of the "Python for HPC" training

Language: Jupyter Notebook - Size: 7.72 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 42 - Forks: 18

dgerlanc/dask-scaling-dataframe

Python and Dask: Scaling the Dataframe

Language: Jupyter Notebook - Size: 15.8 MB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 40 - Forks: 20

lesommer/oocgcm

oocgcm is a python library for the analysis of large gridded geophysical dataset.

Language: Python - Size: 2.93 MB - Last synced at: 30 days ago - Pushed at: over 7 years ago - Stars: 39 - Forks: 11

TGSAI/mdio-python

Cloud native, scalable storage engine for various types of energy data.

Language: Python - Size: 6.19 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 38 - Forks: 14

thewtex/ngff-zarr

A lean and kind Open Microscopy Environment (OME) Next Generation File Format (NGFF) Zarr implementation.

Language: Python - Size: 688 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 35 - Forks: 9

jrbourbeau/madpy-dask

MadPy Dask talk materials

Language: Jupyter Notebook - Size: 889 KB - Last synced at: 12 days ago - Pushed at: over 6 years ago - Stars: 33 - Forks: 4

iamtekson/geospatial-data-analysis-python

This repo contain the most common tools used in geospatial analysis using python!

Language: Jupyter Notebook - Size: 46.2 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 32 - Forks: 23

MDAnalysis/pmda

Parallel algorithms for MDAnalysis

Language: Python - Size: 6.71 MB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 31 - Forks: 23

OpenDataAnalytics/gaia

Gaia is a geospatial analysis library jointly developed by Kitware and Epidemico.

Language: Python - Size: 9.29 MB - Last synced at: 6 months ago - Pushed at: about 6 years ago - Stars: 31 - Forks: 15

basnijholt/adaptive-scheduler

Run many functions (adaptively) on many cores (>10k-100k) using mpi4py.futures, ipyparallel, loky, or dask-mpi. :tada:

Language: Python - Size: 935 KB - Last synced at: about 20 hours ago - Pushed at: 7 days ago - Stars: 30 - Forks: 11

umr-lops/xsar

Synthetic Aperture Radar (SAR) Level-1 GRD python mapper for efficient xarray/dask based processing

Language: Python - Size: 20.2 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 30 - Forks: 9

coiled/dask-snowflake

Dask integration for Snowflake

Language: Python - Size: 64.5 KB - Last synced at: 26 days ago - Pushed at: 6 months ago - Stars: 30 - Forks: 9

mpes-kit/mpes

Distributed data processing routines for multidimensional photoemission spectroscopy (MPES)

Language: Python - Size: 27.5 MB - Last synced at: 4 days ago - Pushed at: over 2 years ago - Stars: 30 - Forks: 6

daskos/daskos

Apache Mesos backend for Dask scheduling library

Language: Python - Size: 82 KB - Last synced at: 20 days ago - Pushed at: over 7 years ago - Stars: 28 - Forks: 5

Vizzuality/cog_worker

Scalable arbitrary analysis on COGs

Language: Jupyter Notebook - Size: 33.3 MB - Last synced at: 24 days ago - Pushed at: 10 months ago - Stars: 27 - Forks: 1

msalvaris/DaskMaskRCNN

Running Mask-RCNN on Dask with PyTorch

Language: Python - Size: 36.1 KB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 26 - Forks: 3

dask-contrib/dask-histogram

Histograms with task scheduling.

Language: Python - Size: 526 KB - Last synced at: about 3 hours ago - Pushed at: about 5 hours ago - Stars: 24 - Forks: 6

itamarst/dask-memusage

A low-impact profiler to figure out how much memory each task in Dask is using

Language: Python - Size: 24.4 KB - Last synced at: 23 days ago - Pushed at: about 2 years ago - Stars: 24 - Forks: 1

NCAR/esmlab 📦

Earth System Model Lab (esmlab). ⚠️⚠️ ESMLab functionality has been moved into <https://github.com/NCAR/geocat-comp>. ⚠️⚠️

Language: Python - Size: 2.9 MB - Last synced at: about 2 months ago - Pushed at: about 4 years ago - Stars: 24 - Forks: 8

sinhrks/daskperiment

Reproducibility for Humans: A lightweight tool to perform reproducible machine learning experiment.

Language: Python - Size: 2.22 MB - Last synced at: 4 days ago - Pushed at: about 6 years ago - Stars: 24 - Forks: 5

makepath/austin-ml-change-detection-demo

A change detection demo for the Austin area using a pre-trained PyTorch model scaled with Dask on Planet imagery.

Language: Jupyter Notebook - Size: 160 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 23 - Forks: 3

PeterFogh/dvc_dask_use_case

A use case of a reproducible machine learning pipeline using Dask, DVC, and MLflow.

Language: Python - Size: 62.5 KB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 23 - Forks: 2

ratt-ru/codex-africanus

Radio Astronomy Algorithms Library

Language: Python - Size: 1.52 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 20 - Forks: 10

BlazingDB/Welcome_to_BlazingSQL_Notebooks

RAPIDS data science. No setup required.

Language: Jupyter Notebook - Size: 189 MB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 20 - Forks: 13

ratt-ru/dask-ms

Implementation of a dask/xarray dataset backed by a CASA MS

Language: Python - Size: 6.72 MB - Last synced at: 7 days ago - Pushed at: about 1 month ago - Stars: 19 - Forks: 7

pnnl/mercat

MerCat: python code for versatile k-mer counting and diversity estimation for database independent property analysis for meta -ome data

Language: Python - Size: 2.7 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 18 - Forks: 13

pangeo-data/pangeo-binder

Pangeo + Binder (dev repo for a binder/pangeo fusion concept)

Language: Python - Size: 770 KB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 18 - Forks: 13

dionresearch/stemgraphic

stemgraphic python package for visualization of data and text

Language: Jupyter Notebook - Size: 23.8 MB - Last synced at: 30 days ago - Pushed at: about 4 years ago - Stars: 18 - Forks: 1