Topic: "checkpointing"
kakaobrain/torchgpipe
A GPipe implementation in PyTorch
Language: Python - Size: 449 KB - Last synced at: 14 days ago - Pushed at: 9 months ago - Stars: 836 - Forks: 99

argonne-lcf/dlio_benchmark
An I/O benchmark for deep Learning applications
Language: Python - Size: 2.57 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 87 - Forks: 37

cedana/cedana-cli
Cedana: Access and run on compute anywhere in the world, on any provider. Migrate seamlessly between providers, arbitraging price/performance in realtime to maximize pure runtime.
Language: Go - Size: 31.4 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 58 - Forks: 1

ECP-VeloC/VELOC
Very-Low Overhead Checkpointing System
Language: C++ - Size: 925 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 55 - Forks: 23

jorgensd/adios4dolfinx
Extending DOLFINx with checkpointing functionality
Language: Python - Size: 465 KB - Last synced at: about 17 hours ago - Pushed at: about 1 month ago - Stars: 24 - Forks: 7

dorukkarinca/keras-buoy
Keras wrapper that autosaves what ModelCheckpoint cannot.
Language: Python - Size: 42 KB - Last synced at: 12 days ago - Pushed at: almost 3 years ago - Stars: 24 - Forks: 9

f-dangel/wandb_preempt
Code and tutorial on integrating wandb sweeps with Slurm pre-emption
Language: Python - Size: 1.62 MB - Last synced at: 21 days ago - Pushed at: 7 months ago - Stars: 2 - Forks: 0

Christopher-K-Long/thread-chunks
A python package for performing memory intensive computations in parallel using chunks and checkpointing.
Language: Python - Size: 51.8 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 1 - Forks: 0

rubrikinc/sysfail
A shared library to help test your code with failure-injection
Language: C++ - Size: 156 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 2

jrwellshpc/dmtcp_scripts
DMTCP scripts to get Python scripts working with SLURM.
Language: Shell - Size: 50.8 KB - Last synced at: 9 days ago - Pushed at: 12 months ago - Stars: 1 - Forks: 0

gulabpatel/Model_Checkpoingting
Language: Jupyter Notebook - Size: 8.79 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

Christopher-K-Long/saveable-objects
A python package for checkpointing, saving, and loading objects.
Language: Python - Size: 80.1 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

kamangir/blue-objects-2024-09-05-a
🌀 data objects for Bash (attempt one).
Size: 17.6 KB - Last synced at: 8 days ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

grebtsew/AlbumOrganizer
A digital album face recognition manager, that isolates images of a specified person from a digital album.
Language: Python - Size: 683 KB - Last synced at: 27 days ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

alex-w-99/Checkpointing-Program
A lightweight checkpointing program written in C.
Language: C - Size: 727 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

AD1024/torch-checkpointing
Compile a torch model to a checkpointed model
Language: Python - Size: 76.2 KB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 4
