An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: checkpointing

argonne-lcf/dlio_benchmark

An I/O benchmark for deep Learning applications

Language: Python - Size: 2.53 MB - Last synced at: 13 days ago - Pushed at: 15 days ago - Stars: 87 - Forks: 39

jorgensd/adios4dolfinx

Extending DOLFINx with checkpointing functionality

Language: Python - Size: 465 KB - Last synced at: 5 days ago - Pushed at: 2 months ago - Stars: 25 - Forks: 7

kakaobrain/torchgpipe

A GPipe implementation in PyTorch

Language: Python - Size: 449 KB - Last synced at: 15 days ago - Pushed at: 11 months ago - Stars: 841 - Forks: 99

Christopher-K-Long/thread-chunks

A python package for performing memory intensive computations in parallel using chunks and checkpointing.

Language: Python - Size: 51.8 KB - Last synced at: 18 days ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

Christopher-K-Long/saveable-objects

A python package for checkpointing, saving, and loading objects.

Language: Python - Size: 80.1 KB - Last synced at: 15 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

cedana/cedana-cli

Cedana: Access and run on compute anywhere in the world, on any provider. Migrate seamlessly between providers, arbitraging price/performance in realtime to maximize pure runtime.

Language: Go - Size: 31.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 58 - Forks: 1

dorukkarinca/keras-buoy

Keras wrapper that autosaves what ModelCheckpoint cannot.

Language: Python - Size: 42 KB - Last synced at: 15 days ago - Pushed at: almost 3 years ago - Stars: 24 - Forks: 9

ECP-VeloC/VELOC

Very-Low Overhead Checkpointing System

Language: C++ - Size: 925 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 55 - Forks: 23

rubrikinc/sysfail

A shared library to help test your code with failure-injection

Language: C++ - Size: 156 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 2

jrwellshpc/dmtcp_scripts

DMTCP scripts to get Python scripts working with SLURM.

Language: Shell - Size: 50.8 KB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

f-dangel/wandb_preempt

Code and tutorial on integrating wandb sweeps with Slurm pre-emption

Language: Python - Size: 1.62 MB - Last synced at: 20 days ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

kamangir/blue-objects-2024-09-05-a

🌀 data objects for Bash (attempt one).

Size: 17.6 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

AD1024/torch-checkpointing

Compile a torch model to a checkpointed model

Language: Python - Size: 76.2 KB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 4

grebtsew/AlbumOrganizer

A digital album face recognition manager, that isolates images of a specified person from a digital album.

Language: Python - Size: 683 KB - Last synced at: 2 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

alex-w-99/Checkpointing-Program

A lightweight checkpointing program written in C.

Language: C - Size: 727 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

gulabpatel/Model_Checkpoingting

Language: Jupyter Notebook - Size: 8.79 KB - Last synced at: about 12 hours ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0