GitHub topics: checkpointing
argonne-lcf/dlio_benchmark
An I/O benchmark for deep Learning applications
Language: Python - Size: 2.53 MB - Last synced at: 13 days ago - Pushed at: 15 days ago - Stars: 87 - Forks: 39

jorgensd/adios4dolfinx
Extending DOLFINx with checkpointing functionality
Language: Python - Size: 465 KB - Last synced at: 5 days ago - Pushed at: 2 months ago - Stars: 25 - Forks: 7

kakaobrain/torchgpipe
A GPipe implementation in PyTorch
Language: Python - Size: 449 KB - Last synced at: 15 days ago - Pushed at: 11 months ago - Stars: 841 - Forks: 99

Christopher-K-Long/thread-chunks
A python package for performing memory intensive computations in parallel using chunks and checkpointing.
Language: Python - Size: 51.8 KB - Last synced at: 18 days ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

Christopher-K-Long/saveable-objects
A python package for checkpointing, saving, and loading objects.
Language: Python - Size: 80.1 KB - Last synced at: 15 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

cedana/cedana-cli
Cedana: Access and run on compute anywhere in the world, on any provider. Migrate seamlessly between providers, arbitraging price/performance in realtime to maximize pure runtime.
Language: Go - Size: 31.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 58 - Forks: 1

dorukkarinca/keras-buoy
Keras wrapper that autosaves what ModelCheckpoint cannot.
Language: Python - Size: 42 KB - Last synced at: 15 days ago - Pushed at: almost 3 years ago - Stars: 24 - Forks: 9

ECP-VeloC/VELOC
Very-Low Overhead Checkpointing System
Language: C++ - Size: 925 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 55 - Forks: 23

rubrikinc/sysfail
A shared library to help test your code with failure-injection
Language: C++ - Size: 156 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 2

jrwellshpc/dmtcp_scripts
DMTCP scripts to get Python scripts working with SLURM.
Language: Shell - Size: 50.8 KB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

f-dangel/wandb_preempt
Code and tutorial on integrating wandb sweeps with Slurm pre-emption
Language: Python - Size: 1.62 MB - Last synced at: 20 days ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

kamangir/blue-objects-2024-09-05-a
🌀 data objects for Bash (attempt one).
Size: 17.6 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

AD1024/torch-checkpointing
Compile a torch model to a checkpointed model
Language: Python - Size: 76.2 KB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 4

grebtsew/AlbumOrganizer
A digital album face recognition manager, that isolates images of a specified person from a digital album.
Language: Python - Size: 683 KB - Last synced at: 2 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

alex-w-99/Checkpointing-Program
A lightweight checkpointing program written in C.
Language: C - Size: 727 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

gulabpatel/Model_Checkpoingting
Language: Jupyter Notebook - Size: 8.79 KB - Last synced at: about 12 hours ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0
