Topic: "pytorchjob"
polyaxon/polyaxon-examples
Code for tutorials and examples
Language: Python - Size: 919 KB - Last synced at: 7 months ago - Pushed at: about 2 years ago - Stars: 68 - Forks: 25

BaizeAI/kcover
🧯 Kubernetes coverage for fault awareness and recovery, works for any LLMOps, MLOps, AI workloads.
Language: Go - Size: 53.7 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 25 - Forks: 1

0M1J/kubeflow-GNN
GNN training in kubeflow.
Language: Python - Size: 702 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 4

adindayup/kcover
🧯 Kubernetes coverage for fault awareness and recovery, works for any LLMOps, MLOps, AI workloads. kubeflow, kubernetes, kubernetes-controller, llm, llmops, mlops, nvidia-gpu, pytorchjob, tfjob, xid-error
Language: Go - Size: 27.3 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0
