Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: dask
dask/dask
Parallel computing with task scheduling
Language: Python - Size: 66.7 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 11,971 - Forks: 1,665
rapidsai/cudf
cuDF - GPU DataFrame Library
Language: C++ - Size: 134 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 7,236 - Forks: 829
ibis-project/ibis
the portable Python dataframe library
Language: Python - Size: 78.5 MB - Last synced: about 7 hours ago - Pushed: about 7 hours ago - Stars: 4,295 - Forks: 537
pydata/xarray
N-D labeled arrays and datasets in Python
Language: Python - Size: 41.3 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 3,396 - Forks: 1,016
TDAmeritrade/stumpy
STUMPY is a powerful and scalable Python library for modern time series analysis
Language: Python - Size: 129 MB - Last synced: about 8 hours ago - Pushed: 1 day ago - Stars: 3,026 - Forks: 283
mars-project/mars
Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
Language: Python - Size: 37 MB - Last synced: 4 days ago - Pushed: 5 months ago - Stars: 2,676 - Forks: 322
jmcarpenter2/swifter
A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner
Language: Python - Size: 2.15 MB - Last synced: 2 days ago - Pushed: about 2 months ago - Stars: 2,473 - Forks: 101
fugue-project/fugue
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
Language: Python - Size: 6.3 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1,866 - Forks: 92
dask/distributed
A distributed task scheduler for Dask
Language: Python - Size: 191 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1,539 - Forks: 703
hi-primus/optimus
:truck: Agile Data Preparation Workflows madeΒ easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Language: Python - Size: 110 MB - Last synced: about 1 month ago - Pushed: about 2 months ago - Stars: 1,441 - Forks: 233
itamarst/eliot
Eliot: the logging system that tells you *why* it happened
Language: Python - Size: 1.91 MB - Last synced: about 2 hours ago - Pushed: 3 months ago - Stars: 1,087 - Forks: 65
pytroll/satpy
Python package for earth-observing satellite data processing
Language: Python - Size: 20.8 MB - Last synced: about 20 hours ago - Pushed: 2 days ago - Stars: 1,018 - Forks: 283
Nixtla/mlforecast
Scalable machine π€ learning for time series forecasting.
Language: Python - Size: 27.1 MB - Last synced: 7 days ago - Pushed: 8 days ago - Stars: 729 - Forks: 68
ranaroussi/pystore
Fast data store for Pandas time-series data
Language: Python - Size: 138 KB - Last synced: 25 days ago - Pushed: about 2 months ago - Stars: 539 - Forks: 97
polyaxon/traceml
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
Language: Python - Size: 118 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 492 - Forks: 43
capitalone/datacompy
Pandas and Spark DataFrame comparison for humans and more!
Language: Python - Size: 9.11 MB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 394 - Forks: 122
dask-contrib/dask-sql
Distributed SQL Engine in Python using Dask
Language: Python - Size: 3.34 MB - Last synced: 1 day ago - Pushed: 2 days ago - Stars: 367 - Forks: 70
pytroll/pyresample
Geospatial image resampling in Python
Language: Python - Size: 16.4 MB - Last synced: about 2 months ago - Pushed: 3 months ago - Stars: 324 - Forks: 94
DataCanvasIO/HyperGBM
A full pipeline AutoML tool for tabular data
Language: Python - Size: 11 MB - Last synced: 17 days ago - Pushed: 3 months ago - Stars: 323 - Forks: 45
Ouranosinc/xclim
Library of derived climate variables, ie climate indicators, based on xarray.
Language: Python - Size: 57.1 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 296 - Forks: 49
JiaweiZhuang/xESMF
Universal Regridder for Geospatial Data
Language: Python - Size: 2.79 MB - Last synced: 3 months ago - Pushed: over 2 years ago - Stars: 264 - Forks: 49
nebari-dev/nebari
πͺ΄ Nebari - your open source data science platform
Language: Python - Size: 15.2 MB - Last synced: about 8 hours ago - Pushed: about 13 hours ago - Stars: 262 - Forks: 87
timkpaine/paperboy
A web frontend for scheduling Jupyter notebook reports
Language: Python - Size: 12.5 MB - Last synced: 6 days ago - Pushed: over 2 years ago - Stars: 248 - Forks: 26
NVIDIA-Merlin/models
Merlin Models is a collection of deep learning recommender system model reference implementations
Language: Python - Size: 113 MB - Last synced: 19 days ago - Pushed: 19 days ago - Stars: 241 - Forks: 48
dask/dask-jobqueue
Deploy Dask on job schedulers like PBS, SLURM, and SGE
Language: Python - Size: 667 KB - Last synced: 3 days ago - Pushed: about 2 months ago - Stars: 230 - Forks: 137
aws-samples/amazon-sagemaker-local-mode
Amazon SageMaker Local Mode Examples
Language: Python - Size: 5.98 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 228 - Forks: 55
gjoseph92/stackstac
Turn a STAC catalog into a dask-based xarray
Language: Python - Size: 56.1 MB - Last synced: about 10 hours ago - Pushed: 5 months ago - Stars: 225 - Forks: 46
pangeo-data/climpred
:earth_americas: Verification of weather and climate forecasts :earth_africa:
Language: Python - Size: 58.1 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 217 - Forks: 48
LDO-CERT/orochi
The Volatility Collaborative GUI
Language: JavaScript - Size: 35.7 MB - Last synced: about 19 hours ago - Pushed: about 21 hours ago - Stars: 202 - Forks: 19
AllenCellModeling/aicsimageio
Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Python
Language: Python - Size: 173 MB - Last synced: 9 days ago - Pushed: 2 months ago - Stars: 192 - Forks: 50
jgrss/geowombat
GeoWombat: Utilities for geospatial data
Language: Jupyter Notebook - Size: 240 MB - Last synced: 2 days ago - Pushed: 11 days ago - Stars: 176 - Forks: 10
JDASoftwareGroup/kartothek
A consistent table management library in python
Language: Python - Size: 2.09 MB - Last synced: 26 days ago - Pushed: about 1 year ago - Stars: 161 - Forks: 53
ESDS-Leipzig/cubo
On-Demand Earth System Data Cubes (ESDCs) in Python
Language: Python - Size: 1.65 MB - Last synced: about 10 hours ago - Pushed: 1 day ago - Stars: 152 - Forks: 9
ray-project/xgboost_ray
Distributed XGBoost on Ray
Language: Python - Size: 472 KB - Last synced: 14 days ago - Pushed: 3 months ago - Stars: 133 - Forks: 33
hi-primus/bumblebee
π A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)
Language: Vue - Size: 23 MB - Last synced: 6 months ago - Pushed: 10 months ago - Stars: 130 - Forks: 34
drshahizan/Python-big-data
Python and Pandas are known to have issues around scalability and efficiency. You will learn how to use libraries such as Modin, Dask, Ray, Vaex etc to overcome the problems faced by Pandas.
Language: Jupyter Notebook - Size: 107 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 126 - Forks: 67
google/xarray-beam
Distributed Xarray with Apache Beam
Language: Python - Size: 271 KB - Last synced: 8 days ago - Pushed: about 2 months ago - Stars: 118 - Forks: 10
jcmgray/autoray
Abstract your array operations.
Language: Python - Size: 1.78 MB - Last synced: about 18 hours ago - Pushed: 1 day ago - Stars: 118 - Forks: 10
xarray-contrib/flox
Fast & furious GroupBy operations for dask.array
Language: Python - Size: 1.54 MB - Last synced: 14 days ago - Pushed: 15 days ago - Stars: 117 - Forks: 15
dask/dask-ec2 π¦
Start a cluster in EC2 for dask.distributed
Language: Python - Size: 200 KB - Last synced: 4 days ago - Pushed: over 3 years ago - Stars: 106 - Forks: 39
facultyai/lens
Summarise and explore Pandas DataFrames
Language: Python - Size: 229 KB - Last synced: 15 days ago - Pushed: almost 4 years ago - Stars: 102 - Forks: 9
geoxarray/geoxarray
Geolocation utilities for xarray
Language: Python - Size: 363 KB - Last synced: 15 days ago - Pushed: 17 days ago - Stars: 95 - Forks: 7
dymaxionlabs/dask-rasterio
Read and write rasters in parallel using Rasterio and Dask
Language: Python - Size: 813 KB - Last synced: about 12 hours ago - Pushed: over 3 years ago - Stars: 94 - Forks: 8
polyaxon/mloperator
Machine Learning Operator & Controller for Kubernetes
Language: Go - Size: 1.56 MB - Last synced: about 1 month ago - Pushed: 3 months ago - Stars: 89 - Forks: 7
msoechting/lexcube
Lexcube: 3D Data Cube Visualization in Jupyter Notebooks
Language: TypeScript - Size: 4.6 MB - Last synced: about 1 month ago - Pushed: about 2 months ago - Stars: 89 - Forks: 3
xarray-contrib/xeofs
Comprehensive EOF analysis in Python with xarray: A versatile, multidimensional, and scalable tool for advanced climate data analysis
Language: Python - Size: 33.3 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 82 - Forks: 16
TimeEval/TimeEval
Evaluation Tool for Anomaly Detection Algorithms on Time Series
Language: Jupyter Notebook - Size: 24.8 MB - Last synced: 18 days ago - Pushed: 18 days ago - Stars: 73 - Forks: 13
radix-ai/graphchain
β‘οΈ An efficient cache for the execution of dask graphs.
Language: Python - Size: 294 KB - Last synced: 9 days ago - Pushed: 7 months ago - Stars: 70 - Forks: 13
miniufo/xgrads
Parse and read ctl and associated binary file commonly used by GrADS into xarray
Language: Jupyter Notebook - Size: 16.5 MB - Last synced: 16 days ago - Pushed: 8 months ago - Stars: 69 - Forks: 25
chmp/framequery π¦
SQL on dataframes - pandas and dask
Language: Python - Size: 291 KB - Last synced: 9 months ago - Pushed: about 6 years ago - Stars: 64 - Forks: 9
NCAR/ncar-python-tutorial π¦
Numerical & Scientific Computing with Python Tutorial
Language: Jupyter Notebook - Size: 49.4 MB - Last synced: about 1 month ago - Pushed: about 4 years ago - Stars: 63 - Forks: 32
dask-contrib/dask-awkward
Native Dask collection for awkward arrays, and the library to use it.
Language: Python - Size: 1.3 MB - Last synced: 1 day ago - Pushed: 2 days ago - Stars: 57 - Forks: 15
bytehub-ai/bytehub
ByteHub: making feature stores simple
Language: Python - Size: 363 KB - Last synced: 16 days ago - Pushed: about 3 years ago - Stars: 57 - Forks: 3
saturncloud/dask-pytorch-ddp
dask-pytorch-ddp is a Python package that makes it easy to train PyTorch models on dask clusters using distributed data parallel.
Language: Python - Size: 64.5 KB - Last synced: 7 days ago - Pushed: about 3 years ago - Stars: 56 - Forks: 8
MITgcm/xmitgcm
Read MITgcm mds binary files into xarray
Language: Python - Size: 117 MB - Last synced: 17 days ago - Pushed: 4 months ago - Stars: 54 - Forks: 64
backtick-se/cowait
Containerized distributed programming framework for Python
Language: Python - Size: 5.69 MB - Last synced: 24 days ago - Pushed: about 1 year ago - Stars: 53 - Forks: 5
dask/knit π¦
Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead
Language: Python - Size: 335 KB - Last synced: about 17 hours ago - Pushed: almost 6 years ago - Stars: 53 - Forks: 10
JSybrandt/agatha
AGATHA: Automatic Graph-mining And Transformer based Hypothesis generation Approach
Language: Python - Size: 6.67 MB - Last synced: 5 months ago - Pushed: almost 4 years ago - Stars: 52 - Forks: 9
ml-tooling/lazycluster π¦
π Distributed machine learning made simple.
Language: Python - Size: 809 KB - Last synced: about 2 months ago - Pushed: about 1 year ago - Stars: 50 - Forks: 12
aertslab/arboreto
A scalable python-based framework for gene regulatory network inference using tree-based ensemble regressors.
Language: Jupyter Notebook - Size: 63.9 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 45 - Forks: 24
dask-contrib/dask-deltatable
A Delta Lake reader for Dask
Language: Python - Size: 249 KB - Last synced: 23 days ago - Pushed: about 1 month ago - Stars: 42 - Forks: 13
shauryashaurya/learn-data-munging
Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.
Language: Jupyter Notebook - Size: 582 MB - Last synced: 20 days ago - Pushed: 21 days ago - Stars: 41 - Forks: 21
dgerlanc/dask-scaling-dataframe
Python and Dask: Scaling the Dataframe
Language: Jupyter Notebook - Size: 15.8 MB - Last synced: about 1 year ago - Pushed: almost 3 years ago - Stars: 40 - Forks: 22
coiled/coiled-resources
Notebooks that support blog posts and tech talks on Dask / Coiled.
Language: Jupyter Notebook - Size: 147 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 39 - Forks: 13
lesommer/oocgcm
oocgcm is a python library for the analysis of large gridded geophysical dataset.
Language: Python - Size: 2.93 MB - Last synced: 4 months ago - Pushed: over 6 years ago - Stars: 38 - Forks: 11
NCAR/cesm-lens-aws
Examples of analysis of CESM LENS data publicly available on Amazon S3 (us-west-2 region) using xarray and dask
Language: Jupyter Notebook - Size: 20.8 MB - Last synced: about 1 month ago - Pushed: 3 months ago - Stars: 38 - Forks: 23
pnavaro/big-data
Python tools for big data
Language: Jupyter Notebook - Size: 167 MB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 37 - Forks: 30
jrbourbeau/madpy-dask
MadPy Dask talk materials
Language: Jupyter Notebook - Size: 889 KB - Last synced: about 1 year ago - Pushed: over 5 years ago - Stars: 33 - Forks: 5
OpenDataAnalytics/gaia
Gaia is a geospatial analysis library jointly developed by Kitware and Epidemico.
Language: Python - Size: 9.29 MB - Last synced: 22 days ago - Pushed: about 5 years ago - Stars: 31 - Forks: 15
MDAnalysis/pmda
Parallel algorithms for MDAnalysis
Language: Python - Size: 6.71 MB - Last synced: about 1 month ago - Pushed: over 1 year ago - Stars: 30 - Forks: 21
gjbex/Python-for-HPC
Repository for participants of the "Python for HPC" training
Language: Jupyter Notebook - Size: 6.63 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 30 - Forks: 18
baggiponte/awesome-pandas-alternatives
Awesome list of alternative dataframe libraries in Python.
Size: 21.5 KB - Last synced: 2 days ago - Pushed: over 1 year ago - Stars: 29 - Forks: 3
coiled/dask-snowflake
Dask integration for Snowflake
Language: Python - Size: 57.6 KB - Last synced: 27 days ago - Pushed: about 2 months ago - Stars: 28 - Forks: 7
daskos/daskos
Apache Mesos backend for Dask scheduling library
Language: Python - Size: 82 KB - Last synced: 4 days ago - Pushed: over 6 years ago - Stars: 28 - Forks: 5
TGSAI/mdio-python
Cloud native, scalable storage engine for various types of energy data.
Language: Python - Size: 3.48 MB - Last synced: 15 days ago - Pushed: 16 days ago - Stars: 28 - Forks: 10
mpes-kit/mpes
Distributed data processing routines for multidimensional photoemission spectroscopy (MPES)
Language: Python - Size: 27.5 MB - Last synced: 22 days ago - Pushed: over 1 year ago - Stars: 27 - Forks: 6
basnijholt/adaptive-scheduler
Run many functions (adaptively) on many cores (>10k-100k) using mpi4py.futures, ipyparallel, loky, or dask-mpi. :tada:
Language: Python - Size: 932 KB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 26 - Forks: 9
msalvaris/DaskMaskRCNN
Running Mask-RCNN on Dask with PyTorch
Language: Python - Size: 36.1 KB - Last synced: about 1 year ago - Pushed: about 5 years ago - Stars: 26 - Forks: 3
Vizzuality/cog_worker
Scalable arbitrary analysis on COGs
Language: Jupyter Notebook - Size: 33.1 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 26 - Forks: 1
iamtekson/geospatial-data-analysis-python
This repo contain the most common tools used in geospatial analysis using python!
Language: Jupyter Notebook - Size: 46.2 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 24 - Forks: 18
NCAR/esmlab π¦
Earth System Model Lab (esmlab). β οΈβ οΈ ESMLab functionality has been moved into <https://github.com/NCAR/geocat-comp>. β οΈβ οΈ
Language: Python - Size: 2.9 MB - Last synced: 20 days ago - Pushed: about 3 years ago - Stars: 24 - Forks: 8
umr-lops/xsar
Synthetic Aperture Radar (SAR) Level-1 GRD python mapper for efficient xarray/dask based processing
Language: Python - Size: 20.1 MB - Last synced: about 9 hours ago - Pushed: about 23 hours ago - Stars: 24 - Forks: 8
sinhrks/daskperiment
Reproducibility for Humans: A lightweight tool to perform reproducible machine learning experiment.
Language: Python - Size: 2.22 MB - Last synced: 2 months ago - Pushed: about 5 years ago - Stars: 24 - Forks: 5
itamarst/dask-memusage
A low-impact profiler to figure out how much memory each task in Dask is using
Language: Python - Size: 24.4 KB - Last synced: 11 days ago - Pushed: about 1 year ago - Stars: 24 - Forks: 1
PeterFogh/dvc_dask_use_case
A use case of a reproducible machine learning pipeline using Dask, DVC, and MLflow.
Language: Python - Size: 62.5 KB - Last synced: 3 months ago - Pushed: about 5 years ago - Stars: 23 - Forks: 2
makepath/austin-ml-change-detection-demo
A change detection demo for the Austin area using a pre-trained PyTorch model scaled with Dask on Planet imagery.
Language: Jupyter Notebook - Size: 160 MB - Last synced: 4 months ago - Pushed: over 1 year ago - Stars: 22 - Forks: 2
dask-contrib/dask-histogram
Histograms with task scheduling.
Language: Python - Size: 389 KB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 22 - Forks: 3
pnnl/mercat
MerCat: python code for versatile k-mer counting and diversity estimation for database independent property analysis for meta -ome data
Language: Python - Size: 2.7 MB - Last synced: 7 months ago - Pushed: over 1 year ago - Stars: 18 - Forks: 11
thewtex/ngff-zarr
A lean and kind Open Microscopy Environment (OME) Next Generation File Format (NGFF) Zarr implementation.
Language: Python - Size: 223 KB - Last synced: 27 days ago - Pushed: 28 days ago - Stars: 18 - Forks: 3
ratt-ru/dask-ms
Implementation of a dask/xarray dataset backed by a CASA MS
Language: Python - Size: 6.68 MB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 18 - Forks: 6
treebeardtech/kubeflow-bootstrap
πͺ 1-click Kubeflow using ArgoCD
Language: Shell - Size: 2.67 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 18 - Forks: 5
dionresearch/stemgraphic
stemgraphic python package for visualization of data and text
Language: Jupyter Notebook - Size: 23.8 MB - Last synced: 20 days ago - Pushed: about 3 years ago - Stars: 18 - Forks: 1
pangeo-data/pangeo-binder
Pangeo + Binder (dev repo for a binder/pangeo fusion concept)
Language: Python - Size: 770 KB - Last synced: 4 months ago - Pushed: almost 3 years ago - Stars: 18 - Forks: 13
BlazingDB/Welcome_to_BlazingSQL_Notebooks
RAPIDS data science. No setup required.
Language: Jupyter Notebook - Size: 189 MB - Last synced: about 1 year ago - Pushed: about 3 years ago - Stars: 17 - Forks: 13
ratt-ru/codex-africanus
Radio Astronomy Algorithms Library
Language: Python - Size: 1.42 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 16 - Forks: 10
saturncloud/prefect-saturn
Python client for using Prefect Cloud with Saturn Cloud
Language: Python - Size: 155 KB - Last synced: 14 days ago - Pushed: about 2 years ago - Stars: 16 - Forks: 4
bioio-devs/bioio
Image reading, metadata management, and image writing for Microscopy images in Python
Language: Python - Size: 5.26 MB - Last synced: 15 days ago - Pushed: 17 days ago - Stars: 16 - Forks: 1
CoffeaTeam/coffea-casa
Repository with configuration setup of a prototype of analysis facility - "coffea-casa"
Language: Python - Size: 11.3 MB - Last synced: 28 days ago - Pushed: 28 days ago - Stars: 16 - Forks: 17
anovv/svoe
A scalable, declarative, low-code framework for real-time and batch feature calculation/management (quant finance, anomaly/fraud detection, etc.), predictive ML training/inference and simulation. Built on top of Ray
Language: Python - Size: 78.6 MB - Last synced: 29 days ago - Pushed: 4 months ago - Stars: 15 - Forks: 10
splunk/deep-learning-toolkit
Deep Learning Toolkit for Splunk
Language: Python - Size: 15.4 MB - Last synced: about 1 month ago - Pushed: about 2 months ago - Stars: 15 - Forks: 5