Topic: "3d-parallelism"
xrsrke/pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
Language: Python - Size: 1.26 MB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 82 - Forks: 18
