Topic: "model-parallelism"
hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
Language: Python - Size: 62.8 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 40,831 - Forks: 4,499

deepspeedai/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language: Python - Size: 217 MB - Last synced at: about 3 hours ago - Pushed at: about 4 hours ago - Stars: 38,178 - Forks: 4,349

kakaobrain/torchgpipe
A GPipe implementation in PyTorch
Language: Python - Size: 449 KB - Last synced at: 12 days ago - Pushed at: 9 months ago - Stars: 836 - Forks: 99

PaddlePaddle/PaddleFleetX
飞桨大模型开发套件,提供大语言模型、跨模态大模型、生物计算大模型等领域的全流程开发工具链。
Language: Python - Size: 637 MB - Last synced at: 7 days ago - Pushed at: 11 months ago - Stars: 465 - Forks: 164

Oneflow-Inc/libai
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
Language: Python - Size: 34.6 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 402 - Forks: 56

kaiyuyue/torchshard
Slicing a PyTorch Tensor Into Parallel Shards
Language: Python - Size: 4.8 MB - Last synced at: about 7 hours ago - Pushed at: almost 4 years ago - Stars: 298 - Forks: 15

alibaba/EasyParallelLibrary
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
Language: Python - Size: 771 KB - Last synced at: 27 days ago - Pushed at: about 2 years ago - Stars: 267 - Forks: 49

Shenggan/awesome-distributed-ml
A curated list of awesome projects and papers for distributed training or inference
Size: 44.9 KB - Last synced at: 9 days ago - Pushed at: 7 months ago - Stars: 231 - Forks: 27

xrsrke/pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
Language: Python - Size: 1.26 MB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 82 - Forks: 18

tanyuqian/redco
NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference
Language: Python - Size: 11.5 MB - Last synced at: 1 day ago - Pushed at: 5 months ago - Stars: 65 - Forks: 7

hkproj/pytorch-transformer-distributed
Distributed training (multi-node) of a Transformer model
Language: Python - Size: 4.03 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 63 - Forks: 26

NERSC/sc23-dl-tutorial
SC23 Deep Learning at Scale Tutorial Material
Language: Python - Size: 15.7 MB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 41 - Forks: 9

ryantd/veloce
WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.
Language: Python - Size: 9.13 MB - Last synced at: 21 days ago - Pushed at: over 2 years ago - Stars: 18 - Forks: 0

AlibabaPAI/FlashModels
Fast and easy distributed model training examples.
Language: Python - Size: 42.9 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 9 - Forks: 4

Shenggan/atp
Adaptive Tensor Parallelism for Foundation Models
Language: Python - Size: 3.22 MB - Last synced at: 25 days ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 0

atakehiro/3D-U-Net-pytorch-model-parallel
PyTorch implementation of 3D U-Net with model parallel in 2GPU for large model
Language: Python - Size: 85 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 9 - Forks: 1

dlzou/computron
Serving distributed deep learning models with model parallel swapping.
Language: Jupyter Notebook - Size: 2.1 MB - Last synced at: 18 days ago - Pushed at: almost 2 years ago - Stars: 5 - Forks: 1

fanpu/DynPartition
Official implementation of DynPartition: Automatic Optimal Pipeline Parallelism of Dynamic Neural Networks over Heterogeneous GPU Systems for Inference Tasks
Language: Python - Size: 135 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 5 - Forks: 0

sjlee25/legion-readme
Description of Framework for Efficient Fused-layer Cost Estimation, Legion (2021)
Size: 1.58 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

LER0ever/HPGO
Development of Project HPGO | Hybrid Parallelism Global Orchestration
Size: 5.29 MB - Last synced at: 3 days ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 0

dscpesu/NetTorrent
A decentralized and distributed framework for training DNNs
Language: Python - Size: 9.92 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 0

NERSC/dl-at-scale-training
Deep Learning at Scale Training Event at NERSC
Language: Python - Size: 17.3 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 2

zjc664656505/LinguaLinked
Distributed-Parallelism over Heterogeneous Devices
Language: Python - Size: 302 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

AnveshaM/Enhancing-performance-of-big-data-machine-learning-models-on-Google-Cloud-Platform
The project is focused on parallelising pre-processing, measuring and machine learning in the cloud, as well as the evaluation and analysis of the cloud performance.
Language: Jupyter Notebook - Size: 8.88 MB - Last synced at: about 2 months ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 1

garg-aayush/model-parallelism
Model parallelism for NN architectures with skip connections (eg. ResNets, UNets)
Language: Python - Size: 6.85 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

EunjuYang/distributed-tf
distributed tensorflow (model parallelism) example repository
Language: Python - Size: 8.79 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 2 - Forks: 0

d4l3k/axe
A simple graph partitioning algorithm written in Go. Designed for use for partitioning neural networks across multiple devices which has an added cost when crossing device boundaries.
Language: Go - Size: 9.77 KB - Last synced at: 29 days ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

zhuangsc/altsplit
An MPI-based distributed model parallelism technique for MLP
Language: C - Size: 43.9 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 2

mkrdip/alcf
Contains materials of internship at ALCF during summer of 2019
Language: Python - Size: 24.4 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

joelrorseth/HyperTune
A fully distributed hyperparameter optimization tool for PyTorch DNNs
Language: Python - Size: 6.48 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

ankahira/chainermnx
Extended ChainerMN
Language: Python - Size: 338 KB - Last synced at: 12 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

olk/mnist-performance
performance test of MNIST hand writings usign MXNet + TF
Language: Python - Size: 22.5 KB - Last synced at: about 1 month ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

mzj14/mesh Fork of tensorflow/mesh
Mesh TensorFlow: Model Parallelism Made Easier
Language: Python - Size: 1.01 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0
