An open API service providing repository metadata for many open source software ecosystems.

Topic: "model-parallelism"

hpcaitech/ColossalAI

Making large AI models cheaper, faster and more accessible

Language: Python - Size: 62.8 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 40,831 - Forks: 4,499

deepspeedai/DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language: Python - Size: 217 MB - Last synced at: about 3 hours ago - Pushed at: about 4 hours ago - Stars: 38,178 - Forks: 4,349

kakaobrain/torchgpipe

A GPipe implementation in PyTorch

Language: Python - Size: 449 KB - Last synced at: 12 days ago - Pushed at: 9 months ago - Stars: 836 - Forks: 99

PaddlePaddle/PaddleFleetX

飞桨大模型开发套件,提供大语言模型、跨模态大模型、生物计算大模型等领域的全流程开发工具链。

Language: Python - Size: 637 MB - Last synced at: 7 days ago - Pushed at: 11 months ago - Stars: 465 - Forks: 164

Oneflow-Inc/libai

LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training

Language: Python - Size: 34.6 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 402 - Forks: 56

kaiyuyue/torchshard

Slicing a PyTorch Tensor Into Parallel Shards

Language: Python - Size: 4.8 MB - Last synced at: about 7 hours ago - Pushed at: almost 4 years ago - Stars: 298 - Forks: 15

alibaba/EasyParallelLibrary

Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.

Language: Python - Size: 771 KB - Last synced at: 27 days ago - Pushed at: about 2 years ago - Stars: 267 - Forks: 49

Shenggan/awesome-distributed-ml

A curated list of awesome projects and papers for distributed training or inference

Size: 44.9 KB - Last synced at: 9 days ago - Pushed at: 7 months ago - Stars: 231 - Forks: 27

xrsrke/pipegoose

Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*

Language: Python - Size: 1.26 MB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 82 - Forks: 18

tanyuqian/redco

NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference

Language: Python - Size: 11.5 MB - Last synced at: 1 day ago - Pushed at: 5 months ago - Stars: 65 - Forks: 7

hkproj/pytorch-transformer-distributed

Distributed training (multi-node) of a Transformer model

Language: Python - Size: 4.03 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 63 - Forks: 26

NERSC/sc23-dl-tutorial

SC23 Deep Learning at Scale Tutorial Material

Language: Python - Size: 15.7 MB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 41 - Forks: 9

ryantd/veloce

WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.

Language: Python - Size: 9.13 MB - Last synced at: 21 days ago - Pushed at: over 2 years ago - Stars: 18 - Forks: 0

AlibabaPAI/FlashModels

Fast and easy distributed model training examples.

Language: Python - Size: 42.9 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 9 - Forks: 4

Shenggan/atp

Adaptive Tensor Parallelism for Foundation Models

Language: Python - Size: 3.22 MB - Last synced at: 25 days ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 0

atakehiro/3D-U-Net-pytorch-model-parallel

PyTorch implementation of 3D U-Net with model parallel in 2GPU for large model

Language: Python - Size: 85 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 9 - Forks: 1

dlzou/computron

Serving distributed deep learning models with model parallel swapping.

Language: Jupyter Notebook - Size: 2.1 MB - Last synced at: 18 days ago - Pushed at: almost 2 years ago - Stars: 5 - Forks: 1

fanpu/DynPartition

Official implementation of DynPartition: Automatic Optimal Pipeline Parallelism of Dynamic Neural Networks over Heterogeneous GPU Systems for Inference Tasks

Language: Python - Size: 135 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 5 - Forks: 0

sjlee25/legion-readme

Description of Framework for Efficient Fused-layer Cost Estimation, Legion (2021)

Size: 1.58 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

LER0ever/HPGO

Development of Project HPGO | Hybrid Parallelism Global Orchestration

Size: 5.29 MB - Last synced at: 3 days ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 0

dscpesu/NetTorrent

A decentralized and distributed framework for training DNNs

Language: Python - Size: 9.92 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 0

NERSC/dl-at-scale-training

Deep Learning at Scale Training Event at NERSC

Language: Python - Size: 17.3 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 2

zjc664656505/LinguaLinked

Distributed-Parallelism over Heterogeneous Devices

Language: Python - Size: 302 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

AnveshaM/Enhancing-performance-of-big-data-machine-learning-models-on-Google-Cloud-Platform

The project is focused on parallelising pre-processing, measuring and machine learning in the cloud, as well as the evaluation and analysis of the cloud performance.

Language: Jupyter Notebook - Size: 8.88 MB - Last synced at: about 2 months ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 1

garg-aayush/model-parallelism

Model parallelism for NN architectures with skip connections (eg. ResNets, UNets)

Language: Python - Size: 6.85 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

EunjuYang/distributed-tf

distributed tensorflow (model parallelism) example repository

Language: Python - Size: 8.79 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 2 - Forks: 0

d4l3k/axe

A simple graph partitioning algorithm written in Go. Designed for use for partitioning neural networks across multiple devices which has an added cost when crossing device boundaries.

Language: Go - Size: 9.77 KB - Last synced at: 29 days ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

zhuangsc/altsplit

An MPI-based distributed model parallelism technique for MLP

Language: C - Size: 43.9 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 2

mkrdip/alcf

Contains materials of internship at ALCF during summer of 2019

Language: Python - Size: 24.4 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

joelrorseth/HyperTune

A fully distributed hyperparameter optimization tool for PyTorch DNNs

Language: Python - Size: 6.48 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

ankahira/chainermnx

Extended ChainerMN

Language: Python - Size: 338 KB - Last synced at: 12 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

olk/mnist-performance

performance test of MNIST hand writings usign MXNet + TF

Language: Python - Size: 22.5 KB - Last synced at: about 1 month ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

mzj14/mesh Fork of tensorflow/mesh

Mesh TensorFlow: Model Parallelism Made Easier

Language: Python - Size: 1.01 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

Related Topics
data-parallelism 16 deep-learning 13 pytorch 13 pipeline-parallelism 11 distributed-training 7 machine-learning 6 gpipe 4 large-scale 3 tensor-parallelism 3 tensorflow 3 distributed-computing 3 mixture-of-experts 2 inference 2 gpu 2 self-supervised-learning 2 horovod 2 edge-computing 2 hpc 2 zero 2 reinforcement-learning 2 python 2 sequence-parallelism 2 transformer 2 distributed-systems 2 tutorial 1 gemma 1 cache 1 dataproc-clusters 1 google-cloud-ai-platform 1 google-cloud-platform 1 google-colaboratory 1 heterogeneous-training 1 image-captioning 1 jax 1 large-language-models 1 llama 1 maml 1 meta-learning 1 mixed-precision 1 mlsys 1 ppo 1 seq2seq 1 stable-diffusion 1 ai-for-science 1 vision-transformers 1 collective-communication 1 distributed-data-parallel 1 gradient-accumulation 1 foundation-models 1 big-model 1 ai 1 vision-transformer 1 oneflow 1 nlp 1 unsupervised-learning 1 pretraining 1 paddlepaddle 1 paddlecloud 1 lightning 1 fleet-api 1 elastic 1 distributed-algorithm 1 cloud 1 benchmark 1 zero-1 1 transformers 1 moe 1 megatron-lm 1 megatron 1 large-scale-language-modeling 1 huggingface-transformers 1 distributed-optimizers 1 3d-parallelism 1 rdd 1 pyspark 1 ml 1 keras-tensorflow 1 ray 1 parameter-server 1 heterogeneity 1 distributed 1 graph-partitioning 1 treelstm 1 scheduling 1 neural-networks 1 dynpartition 1 dynamic-neural-network 1 rust 1 pipedream 1 linux 1 android 1 neural-network 1 mpi-applications 1 3d-unet 1 inference-server 1 ray-tune 1 tvm 1 dl-optimization 1 chainer 1 rnn-tensorflow 1