Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: dask

dask/dask

Parallel computing with task scheduling

Language: Python - Size: 66.7 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 11,971 - Forks: 1,665

rapidsai/cudf

cuDF - GPU DataFrame Library

Language: C++ - Size: 134 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 7,236 - Forks: 829

ibis-project/ibis

the portable Python dataframe library

Language: Python - Size: 78.5 MB - Last synced: about 7 hours ago - Pushed: about 7 hours ago - Stars: 4,295 - Forks: 537

pydata/xarray

N-D labeled arrays and datasets in Python

Language: Python - Size: 41.3 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 3,396 - Forks: 1,016

TDAmeritrade/stumpy

STUMPY is a powerful and scalable Python library for modern time series analysis

Language: Python - Size: 129 MB - Last synced: about 8 hours ago - Pushed: 1 day ago - Stars: 3,026 - Forks: 283

mars-project/mars

Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.

Language: Python - Size: 37 MB - Last synced: 4 days ago - Pushed: 5 months ago - Stars: 2,676 - Forks: 322

jmcarpenter2/swifter

A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner

Language: Python - Size: 2.15 MB - Last synced: 2 days ago - Pushed: about 2 months ago - Stars: 2,473 - Forks: 101

fugue-project/fugue

A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

Language: Python - Size: 6.3 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1,866 - Forks: 92

dask/distributed

A distributed task scheduler for Dask

Language: Python - Size: 191 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1,539 - Forks: 703

hi-primus/optimus

:truck: Agile Data Preparation Workflows madeΒ easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Language: Python - Size: 110 MB - Last synced: about 1 month ago - Pushed: about 2 months ago - Stars: 1,441 - Forks: 233

itamarst/eliot

Eliot: the logging system that tells you *why* it happened

Language: Python - Size: 1.91 MB - Last synced: about 2 hours ago - Pushed: 3 months ago - Stars: 1,087 - Forks: 65

pytroll/satpy

Python package for earth-observing satellite data processing

Language: Python - Size: 20.8 MB - Last synced: about 20 hours ago - Pushed: 2 days ago - Stars: 1,018 - Forks: 283

Nixtla/mlforecast

Scalable machine πŸ€– learning for time series forecasting.

Language: Python - Size: 27.1 MB - Last synced: 7 days ago - Pushed: 8 days ago - Stars: 729 - Forks: 68

ranaroussi/pystore

Fast data store for Pandas time-series data

Language: Python - Size: 138 KB - Last synced: 25 days ago - Pushed: about 2 months ago - Stars: 539 - Forks: 97

polyaxon/traceml

Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.

Language: Python - Size: 118 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 492 - Forks: 43

capitalone/datacompy

Pandas and Spark DataFrame comparison for humans and more!

Language: Python - Size: 9.11 MB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 394 - Forks: 122

dask-contrib/dask-sql

Distributed SQL Engine in Python using Dask

Language: Python - Size: 3.34 MB - Last synced: 1 day ago - Pushed: 2 days ago - Stars: 367 - Forks: 70

pytroll/pyresample

Geospatial image resampling in Python

Language: Python - Size: 16.4 MB - Last synced: about 2 months ago - Pushed: 3 months ago - Stars: 324 - Forks: 94

DataCanvasIO/HyperGBM

A full pipeline AutoML tool for tabular data

Language: Python - Size: 11 MB - Last synced: 17 days ago - Pushed: 3 months ago - Stars: 323 - Forks: 45

Ouranosinc/xclim

Library of derived climate variables, ie climate indicators, based on xarray.

Language: Python - Size: 57.1 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 296 - Forks: 49

JiaweiZhuang/xESMF

Universal Regridder for Geospatial Data

Language: Python - Size: 2.79 MB - Last synced: 3 months ago - Pushed: over 2 years ago - Stars: 264 - Forks: 49

nebari-dev/nebari

πŸͺ΄ Nebari - your open source data science platform

Language: Python - Size: 15.2 MB - Last synced: about 8 hours ago - Pushed: about 13 hours ago - Stars: 262 - Forks: 87

timkpaine/paperboy

A web frontend for scheduling Jupyter notebook reports

Language: Python - Size: 12.5 MB - Last synced: 6 days ago - Pushed: over 2 years ago - Stars: 248 - Forks: 26

NVIDIA-Merlin/models

Merlin Models is a collection of deep learning recommender system model reference implementations

Language: Python - Size: 113 MB - Last synced: 19 days ago - Pushed: 19 days ago - Stars: 241 - Forks: 48

dask/dask-jobqueue

Deploy Dask on job schedulers like PBS, SLURM, and SGE

Language: Python - Size: 667 KB - Last synced: 3 days ago - Pushed: about 2 months ago - Stars: 230 - Forks: 137

aws-samples/amazon-sagemaker-local-mode

Amazon SageMaker Local Mode Examples

Language: Python - Size: 5.98 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 228 - Forks: 55

gjoseph92/stackstac

Turn a STAC catalog into a dask-based xarray

Language: Python - Size: 56.1 MB - Last synced: about 10 hours ago - Pushed: 5 months ago - Stars: 225 - Forks: 46

pangeo-data/climpred

:earth_americas: Verification of weather and climate forecasts :earth_africa:

Language: Python - Size: 58.1 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 217 - Forks: 48

LDO-CERT/orochi

The Volatility Collaborative GUI

Language: JavaScript - Size: 35.7 MB - Last synced: about 19 hours ago - Pushed: about 21 hours ago - Stars: 202 - Forks: 19

AllenCellModeling/aicsimageio

Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Python

Language: Python - Size: 173 MB - Last synced: 9 days ago - Pushed: 2 months ago - Stars: 192 - Forks: 50

jgrss/geowombat

GeoWombat: Utilities for geospatial data

Language: Jupyter Notebook - Size: 240 MB - Last synced: 2 days ago - Pushed: 11 days ago - Stars: 176 - Forks: 10

JDASoftwareGroup/kartothek

A consistent table management library in python

Language: Python - Size: 2.09 MB - Last synced: 26 days ago - Pushed: about 1 year ago - Stars: 161 - Forks: 53

ESDS-Leipzig/cubo

On-Demand Earth System Data Cubes (ESDCs) in Python

Language: Python - Size: 1.65 MB - Last synced: about 10 hours ago - Pushed: 1 day ago - Stars: 152 - Forks: 9

ray-project/xgboost_ray

Distributed XGBoost on Ray

Language: Python - Size: 472 KB - Last synced: 14 days ago - Pushed: 3 months ago - Stars: 133 - Forks: 33

hi-primus/bumblebee

πŸš• A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)

Language: Vue - Size: 23 MB - Last synced: 6 months ago - Pushed: 10 months ago - Stars: 130 - Forks: 34

drshahizan/Python-big-data

Python and Pandas are known to have issues around scalability and efficiency. You will learn how to use libraries such as Modin, Dask, Ray, Vaex etc to overcome the problems faced by Pandas.

Language: Jupyter Notebook - Size: 107 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 126 - Forks: 67

google/xarray-beam

Distributed Xarray with Apache Beam

Language: Python - Size: 271 KB - Last synced: 8 days ago - Pushed: about 2 months ago - Stars: 118 - Forks: 10

jcmgray/autoray

Abstract your array operations.

Language: Python - Size: 1.78 MB - Last synced: about 18 hours ago - Pushed: 1 day ago - Stars: 118 - Forks: 10

xarray-contrib/flox

Fast & furious GroupBy operations for dask.array

Language: Python - Size: 1.54 MB - Last synced: 14 days ago - Pushed: 15 days ago - Stars: 117 - Forks: 15

dask/dask-ec2 πŸ“¦

Start a cluster in EC2 for dask.distributed

Language: Python - Size: 200 KB - Last synced: 4 days ago - Pushed: over 3 years ago - Stars: 106 - Forks: 39

facultyai/lens

Summarise and explore Pandas DataFrames

Language: Python - Size: 229 KB - Last synced: 15 days ago - Pushed: almost 4 years ago - Stars: 102 - Forks: 9

geoxarray/geoxarray

Geolocation utilities for xarray

Language: Python - Size: 363 KB - Last synced: 15 days ago - Pushed: 17 days ago - Stars: 95 - Forks: 7

dymaxionlabs/dask-rasterio

Read and write rasters in parallel using Rasterio and Dask

Language: Python - Size: 813 KB - Last synced: about 12 hours ago - Pushed: over 3 years ago - Stars: 94 - Forks: 8

polyaxon/mloperator

Machine Learning Operator & Controller for Kubernetes

Language: Go - Size: 1.56 MB - Last synced: about 1 month ago - Pushed: 3 months ago - Stars: 89 - Forks: 7

msoechting/lexcube

Lexcube: 3D Data Cube Visualization in Jupyter Notebooks

Language: TypeScript - Size: 4.6 MB - Last synced: about 1 month ago - Pushed: about 2 months ago - Stars: 89 - Forks: 3

xarray-contrib/xeofs

Comprehensive EOF analysis in Python with xarray: A versatile, multidimensional, and scalable tool for advanced climate data analysis

Language: Python - Size: 33.3 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 82 - Forks: 16

TimeEval/TimeEval

Evaluation Tool for Anomaly Detection Algorithms on Time Series

Language: Jupyter Notebook - Size: 24.8 MB - Last synced: 18 days ago - Pushed: 18 days ago - Stars: 73 - Forks: 13

radix-ai/graphchain

⚑️ An efficient cache for the execution of dask graphs.

Language: Python - Size: 294 KB - Last synced: 9 days ago - Pushed: 7 months ago - Stars: 70 - Forks: 13

miniufo/xgrads

Parse and read ctl and associated binary file commonly used by GrADS into xarray

Language: Jupyter Notebook - Size: 16.5 MB - Last synced: 16 days ago - Pushed: 8 months ago - Stars: 69 - Forks: 25

chmp/framequery πŸ“¦

SQL on dataframes - pandas and dask

Language: Python - Size: 291 KB - Last synced: 9 months ago - Pushed: about 6 years ago - Stars: 64 - Forks: 9

NCAR/ncar-python-tutorial πŸ“¦

Numerical & Scientific Computing with Python Tutorial

Language: Jupyter Notebook - Size: 49.4 MB - Last synced: about 1 month ago - Pushed: about 4 years ago - Stars: 63 - Forks: 32

dask-contrib/dask-awkward

Native Dask collection for awkward arrays, and the library to use it.

Language: Python - Size: 1.3 MB - Last synced: 1 day ago - Pushed: 2 days ago - Stars: 57 - Forks: 15

bytehub-ai/bytehub

ByteHub: making feature stores simple

Language: Python - Size: 363 KB - Last synced: 16 days ago - Pushed: about 3 years ago - Stars: 57 - Forks: 3

saturncloud/dask-pytorch-ddp

dask-pytorch-ddp is a Python package that makes it easy to train PyTorch models on dask clusters using distributed data parallel.

Language: Python - Size: 64.5 KB - Last synced: 7 days ago - Pushed: about 3 years ago - Stars: 56 - Forks: 8

MITgcm/xmitgcm

Read MITgcm mds binary files into xarray

Language: Python - Size: 117 MB - Last synced: 17 days ago - Pushed: 4 months ago - Stars: 54 - Forks: 64

backtick-se/cowait

Containerized distributed programming framework for Python

Language: Python - Size: 5.69 MB - Last synced: 24 days ago - Pushed: about 1 year ago - Stars: 53 - Forks: 5

dask/knit πŸ“¦

Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead

Language: Python - Size: 335 KB - Last synced: about 17 hours ago - Pushed: almost 6 years ago - Stars: 53 - Forks: 10

JSybrandt/agatha

AGATHA: Automatic Graph-mining And Transformer based Hypothesis generation Approach

Language: Python - Size: 6.67 MB - Last synced: 5 months ago - Pushed: almost 4 years ago - Stars: 52 - Forks: 9

ml-tooling/lazycluster πŸ“¦

πŸŽ› Distributed machine learning made simple.

Language: Python - Size: 809 KB - Last synced: about 2 months ago - Pushed: about 1 year ago - Stars: 50 - Forks: 12

aertslab/arboreto

A scalable python-based framework for gene regulatory network inference using tree-based ensemble regressors.

Language: Jupyter Notebook - Size: 63.9 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 45 - Forks: 24

dask-contrib/dask-deltatable

A Delta Lake reader for Dask

Language: Python - Size: 249 KB - Last synced: 23 days ago - Pushed: about 1 month ago - Stars: 42 - Forks: 13

shauryashaurya/learn-data-munging

Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.

Language: Jupyter Notebook - Size: 582 MB - Last synced: 20 days ago - Pushed: 21 days ago - Stars: 41 - Forks: 21

dgerlanc/dask-scaling-dataframe

Python and Dask: Scaling the Dataframe

Language: Jupyter Notebook - Size: 15.8 MB - Last synced: about 1 year ago - Pushed: almost 3 years ago - Stars: 40 - Forks: 22

coiled/coiled-resources

Notebooks that support blog posts and tech talks on Dask / Coiled.

Language: Jupyter Notebook - Size: 147 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 39 - Forks: 13

lesommer/oocgcm

oocgcm is a python library for the analysis of large gridded geophysical dataset.

Language: Python - Size: 2.93 MB - Last synced: 4 months ago - Pushed: over 6 years ago - Stars: 38 - Forks: 11

NCAR/cesm-lens-aws

Examples of analysis of CESM LENS data publicly available on Amazon S3 (us-west-2 region) using xarray and dask

Language: Jupyter Notebook - Size: 20.8 MB - Last synced: about 1 month ago - Pushed: 3 months ago - Stars: 38 - Forks: 23

pnavaro/big-data

Python tools for big data

Language: Jupyter Notebook - Size: 167 MB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 37 - Forks: 30

jrbourbeau/madpy-dask

MadPy Dask talk materials

Language: Jupyter Notebook - Size: 889 KB - Last synced: about 1 year ago - Pushed: over 5 years ago - Stars: 33 - Forks: 5

OpenDataAnalytics/gaia

Gaia is a geospatial analysis library jointly developed by Kitware and Epidemico.

Language: Python - Size: 9.29 MB - Last synced: 22 days ago - Pushed: about 5 years ago - Stars: 31 - Forks: 15

MDAnalysis/pmda

Parallel algorithms for MDAnalysis

Language: Python - Size: 6.71 MB - Last synced: about 1 month ago - Pushed: over 1 year ago - Stars: 30 - Forks: 21

gjbex/Python-for-HPC

Repository for participants of the "Python for HPC" training

Language: Jupyter Notebook - Size: 6.63 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 30 - Forks: 18

baggiponte/awesome-pandas-alternatives

Awesome list of alternative dataframe libraries in Python.

Size: 21.5 KB - Last synced: 2 days ago - Pushed: over 1 year ago - Stars: 29 - Forks: 3

coiled/dask-snowflake

Dask integration for Snowflake

Language: Python - Size: 57.6 KB - Last synced: 27 days ago - Pushed: about 2 months ago - Stars: 28 - Forks: 7

daskos/daskos

Apache Mesos backend for Dask scheduling library

Language: Python - Size: 82 KB - Last synced: 4 days ago - Pushed: over 6 years ago - Stars: 28 - Forks: 5

TGSAI/mdio-python

Cloud native, scalable storage engine for various types of energy data.

Language: Python - Size: 3.48 MB - Last synced: 15 days ago - Pushed: 16 days ago - Stars: 28 - Forks: 10

mpes-kit/mpes

Distributed data processing routines for multidimensional photoemission spectroscopy (MPES)

Language: Python - Size: 27.5 MB - Last synced: 22 days ago - Pushed: over 1 year ago - Stars: 27 - Forks: 6

basnijholt/adaptive-scheduler

Run many functions (adaptively) on many cores (>10k-100k) using mpi4py.futures, ipyparallel, loky, or dask-mpi. :tada:

Language: Python - Size: 932 KB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 26 - Forks: 9

msalvaris/DaskMaskRCNN

Running Mask-RCNN on Dask with PyTorch

Language: Python - Size: 36.1 KB - Last synced: about 1 year ago - Pushed: about 5 years ago - Stars: 26 - Forks: 3

Vizzuality/cog_worker

Scalable arbitrary analysis on COGs

Language: Jupyter Notebook - Size: 33.1 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 26 - Forks: 1

iamtekson/geospatial-data-analysis-python

This repo contain the most common tools used in geospatial analysis using python!

Language: Jupyter Notebook - Size: 46.2 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 24 - Forks: 18

NCAR/esmlab πŸ“¦

Earth System Model Lab (esmlab). ⚠️⚠️ ESMLab functionality has been moved into <https://github.com/NCAR/geocat-comp>. ⚠️⚠️

Language: Python - Size: 2.9 MB - Last synced: 20 days ago - Pushed: about 3 years ago - Stars: 24 - Forks: 8

umr-lops/xsar

Synthetic Aperture Radar (SAR) Level-1 GRD python mapper for efficient xarray/dask based processing

Language: Python - Size: 20.1 MB - Last synced: about 9 hours ago - Pushed: about 23 hours ago - Stars: 24 - Forks: 8

sinhrks/daskperiment

Reproducibility for Humans: A lightweight tool to perform reproducible machine learning experiment.

Language: Python - Size: 2.22 MB - Last synced: 2 months ago - Pushed: about 5 years ago - Stars: 24 - Forks: 5

itamarst/dask-memusage

A low-impact profiler to figure out how much memory each task in Dask is using

Language: Python - Size: 24.4 KB - Last synced: 11 days ago - Pushed: about 1 year ago - Stars: 24 - Forks: 1

PeterFogh/dvc_dask_use_case

A use case of a reproducible machine learning pipeline using Dask, DVC, and MLflow.

Language: Python - Size: 62.5 KB - Last synced: 3 months ago - Pushed: about 5 years ago - Stars: 23 - Forks: 2

makepath/austin-ml-change-detection-demo

A change detection demo for the Austin area using a pre-trained PyTorch model scaled with Dask on Planet imagery.

Language: Jupyter Notebook - Size: 160 MB - Last synced: 4 months ago - Pushed: over 1 year ago - Stars: 22 - Forks: 2

dask-contrib/dask-histogram

Histograms with task scheduling.

Language: Python - Size: 389 KB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 22 - Forks: 3

pnnl/mercat

MerCat: python code for versatile k-mer counting and diversity estimation for database independent property analysis for meta -ome data

Language: Python - Size: 2.7 MB - Last synced: 7 months ago - Pushed: over 1 year ago - Stars: 18 - Forks: 11

thewtex/ngff-zarr

A lean and kind Open Microscopy Environment (OME) Next Generation File Format (NGFF) Zarr implementation.

Language: Python - Size: 223 KB - Last synced: 27 days ago - Pushed: 28 days ago - Stars: 18 - Forks: 3

ratt-ru/dask-ms

Implementation of a dask/xarray dataset backed by a CASA MS

Language: Python - Size: 6.68 MB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 18 - Forks: 6

treebeardtech/kubeflow-bootstrap

πŸͺ 1-click Kubeflow using ArgoCD

Language: Shell - Size: 2.67 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 18 - Forks: 5

dionresearch/stemgraphic

stemgraphic python package for visualization of data and text

Language: Jupyter Notebook - Size: 23.8 MB - Last synced: 20 days ago - Pushed: about 3 years ago - Stars: 18 - Forks: 1

pangeo-data/pangeo-binder

Pangeo + Binder (dev repo for a binder/pangeo fusion concept)

Language: Python - Size: 770 KB - Last synced: 4 months ago - Pushed: almost 3 years ago - Stars: 18 - Forks: 13

BlazingDB/Welcome_to_BlazingSQL_Notebooks

RAPIDS data science. No setup required.

Language: Jupyter Notebook - Size: 189 MB - Last synced: about 1 year ago - Pushed: about 3 years ago - Stars: 17 - Forks: 13

ratt-ru/codex-africanus

Radio Astronomy Algorithms Library

Language: Python - Size: 1.42 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 16 - Forks: 10

saturncloud/prefect-saturn

Python client for using Prefect Cloud with Saturn Cloud

Language: Python - Size: 155 KB - Last synced: 14 days ago - Pushed: about 2 years ago - Stars: 16 - Forks: 4

bioio-devs/bioio

Image reading, metadata management, and image writing for Microscopy images in Python

Language: Python - Size: 5.26 MB - Last synced: 15 days ago - Pushed: 17 days ago - Stars: 16 - Forks: 1

CoffeaTeam/coffea-casa

Repository with configuration setup of a prototype of analysis facility - "coffea-casa"

Language: Python - Size: 11.3 MB - Last synced: 28 days ago - Pushed: 28 days ago - Stars: 16 - Forks: 17

anovv/svoe

A scalable, declarative, low-code framework for real-time and batch feature calculation/management (quant finance, anomaly/fraud detection, etc.), predictive ML training/inference and simulation. Built on top of Ray

Language: Python - Size: 78.6 MB - Last synced: 29 days ago - Pushed: 4 months ago - Stars: 15 - Forks: 10

splunk/deep-learning-toolkit

Deep Learning Toolkit for Splunk

Language: Python - Size: 15.4 MB - Last synced: about 1 month ago - Pushed: about 2 months ago - Stars: 15 - Forks: 5