Topic: "data-parallelism"
hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
Language: Python - Size: 63.1 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 40,931 - Forks: 4,522

deepspeedai/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language: Python - Size: 217 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 38,691 - Forks: 4,405

cerndb/dist-keras 📦
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Language: Python - Size: 54.6 MB - Last synced at: 22 days ago - Pushed at: almost 7 years ago - Stars: 623 - Forks: 167

mratsim/weave
A state-of-the-art multithreading runtime: message-passing based, fast, scalable, ultra-low overhead
Language: Nim - Size: 8.58 MB - Last synced at: 3 days ago - Pushed at: 12 months ago - Stars: 558 - Forks: 22

PaddlePaddle/PaddleFleetX
飞桨大模型开发套件,提供大语言模型、跨模态大模型、生物计算大模型等领域的全流程开发工具链。
Language: Python - Size: 637 MB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 467 - Forks: 165

Oneflow-Inc/libai
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
Language: Python - Size: 34.7 MB - Last synced at: 18 days ago - Pushed at: 25 days ago - Stars: 403 - Forks: 56

alibaba/EasyParallelLibrary
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
Language: Python - Size: 771 KB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 267 - Forks: 49

dkeras-project/dkeras
Distributed Keras Engine, Make Keras faster with only one line of code.
Language: Python - Size: 6.48 MB - Last synced at: about 1 month ago - Pushed at: over 5 years ago - Stars: 188 - Forks: 12

wenwei202/terngrad
Ternary Gradients to Reduce Communication in Distributed Deep Learning (TensorFlow)
Language: Python - Size: 5.59 MB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 181 - Forks: 48

vertexclique/orkhon
Orkhon: ML Inference Framework and Server Runtime
Language: Rust - Size: 26.2 MB - Last synced at: 21 days ago - Pushed at: over 4 years ago - Stars: 149 - Forks: 4

xrsrke/pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
Language: Python - Size: 1.26 MB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 82 - Forks: 18

hkproj/pytorch-transformer-distributed
Distributed training (multi-node) of a Transformer model
Language: Python - Size: 4.03 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 66 - Forks: 29

NERSC/sc23-dl-tutorial
SC23 Deep Learning at Scale Tutorial Material
Language: Python - Size: 15.7 MB - Last synced at: 28 days ago - Pushed at: 9 months ago - Stars: 44 - Forks: 9

kuixu/keras_multi_gpu
Multi-GPU training for Keras
Language: Python - Size: 286 KB - Last synced at: 2 months ago - Pushed at: almost 8 years ago - Stars: 44 - Forks: 22

NERSC/dl-at-scale-training
Deep Learning at Scale Training Event at NERSC
Language: Python - Size: 17.3 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 19 - Forks: 12

ryantd/veloce
WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.
Language: Python - Size: 9.13 MB - Last synced at: 26 days ago - Pushed at: almost 3 years ago - Stars: 18 - Forks: 0

tcoppex/cpu-gbfilter
:hotsprings: Optimized Gaussian blur filter on CPU.
Language: C++ - Size: 21.5 KB - Last synced at: about 2 months ago - Pushed at: over 7 years ago - Stars: 17 - Forks: 1

daekeun-ml/sm-distributed-training-step-by-step
This repository provides hands-on labs on PyTorch-based Distributed Training and SageMaker Distributed Training. It is written to make it easy for beginners to get started, and guides you through step-by-step modifications to the code based on the most basic BERT use cases.
Language: Jupyter Notebook - Size: 1.3 MB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 13 - Forks: 2

AlibabaPAI/FlashModels
Fast and easy distributed model training examples.
Language: Python - Size: 42.9 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 9 - Forks: 4

plerros/helsing
A mostly POSIX-compliant utility that scans a given interval for vampire numbers.
Language: C - Size: 358 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 4 - Forks: 1

itzmeanjan/merklize-blake3 📦
OpenCL powered Merklization using BLAKE3
Language: C - Size: 76.2 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 0

Oblomov/cldpp
OpenCL Data Parallel Primitives
Language: C - Size: 62.5 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 0

zbjob/DiscoPoP
Dependence-Based Code Transformation for Coarse-Grained Parallelism
Language: C++ - Size: 18.6 KB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 4 - Forks: 1

sjlee25/batch-partitioning
Batch Partitioning for Multi-PE Inference with TVM (2020)
Language: Python - Size: 3.79 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

LER0ever/HPGO
Development of Project HPGO | Hybrid Parallelism Global Orchestration
Size: 5.29 MB - Last synced at: 10 days ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 0

dscpesu/NetTorrent
A decentralized and distributed framework for training DNNs
Language: Python - Size: 9.92 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 3 - Forks: 0

MurrellGroup/Conflux.jl 📦
Single-node data parallelism in Julia with CUDA
Language: Julia - Size: 31.3 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 2 - Forks: 0

zjc664656505/LinguaLinked
Distributed-Parallelism over Heterogeneous Devices
Language: Python - Size: 302 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

axr6077/Ray-Trace-Parallelization
Complex ray tracing algorithm optimized by using parallelization over different partitioning schemes and explore the performance gains through grain size and processing units (parameters) over sequential algorithm to render a high resolution image.
Language: C++ - Size: 4.76 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

AnveshaM/Enhancing-performance-of-big-data-machine-learning-models-on-Google-Cloud-Platform
The project is focused on parallelising pre-processing, measuring and machine learning in the cloud, as well as the evaluation and analysis of the cloud performance.
Language: Jupyter Notebook - Size: 8.88 MB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 1

HiEST/DistMIS Fork of oriolaranda/DistMIS
Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation
Language: Python - Size: 8.16 MB - Last synced at: 21 days ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 1

thomas-bouvier/distributed-continual-learning
Towards Rehearsal-based Continual Learning at Scale: distributed CL using Horovod + PyTorch on up to 128 GPUs
Language: Python - Size: 930 KB - Last synced at: 2 days ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

diptorupd/numba-dpex Fork of IntelPython/numba-dpex
A SYCL-like kernel compiler for Python
Language: Python - Size: 10.7 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

ashayp22/monte-carlo-options-simd
SIMD multithreaded Monte Carlo options pricer in Rust 🦀
Language: Rust - Size: 77.1 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

oekosheri/pytorch_unet_scaling
Scaling Unet in Pytorch
Language: Jupyter Notebook - Size: 1.63 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

Sujith013/Binary-Classification-using-Machine-Learning-and-Data-parallelism
Binary data classification using TensorFlow and Keras in python and achieving data parallelism using MPI
Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: 9 months ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

axr6077/Hogdkin-Huxley-Neuron-Model
Sequential and Parallel Implementation of the Hodgkin-Huxley Neuron model.
Language: C - Size: 3.91 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

oriolaranda/DistMIS
Official Repository for the paper: Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation
Language: Python - Size: 8.14 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 3

soham-b-github/K8sCIFAR
Distributed Data parallelism on CIFAR-10 using Kubernetes
Language: Python - Size: 2.44 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

infinitygabri/beginner-code-lab
# Beginner Code Lab **Beginner Code Lab** is a multi-language coding playground for those starting their coding journey. 🐙 Dive into web development, backend programming, or mobile app creation and enjoy hands-on practice in a supportive environment. 🌱
Language: TypeScript - Size: 290 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

1set-t/ai-model
Industrial-grade weather visualization system that transforms AI model predictions into professional meteorological plots, emphasizing operational forecasting capabilities.
Size: 1.95 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

nikhilr612/safire
A small library for simulated annealing using arrayfire.
Language: Rust - Size: 36.1 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

TeamBipartite/csc485b-202409-a4
High throughput data-parallel GEMM implementations in Cuda using Cuda cores and Tensor cores
Language: C++ - Size: 834 KB - Last synced at: 8 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

UVA-MLSys/AI-for-Astronomy
A novel Cloud-based Astronomy framework for data parallel AI model inference on AWS
Language: Jupyter Notebook - Size: 200 MB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 4

oekosheri/tensorflow_unet_scaling
Scaling Unet in Tensorflow
Language: Jupyter Notebook - Size: 124 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ncl-teu/ncl_mapreducesim
MapReduceSimulator for Scheduling and Provisioning Algorithms
Language: Java - Size: 18.4 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

sre990/ske-pi
Data parallel and stream parallel skeletons implemented in erlang
Language: Erlang - Size: 274 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

t0re199/GPGPU_PROJECT
CUDA C parallel implementation of the Merge operation.
Language: Cuda - Size: 90.8 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

t0re199/GPGPU
CUDA C parallel implementations of some well-known algorithms.
Language: C - Size: 106 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

joelrorseth/HyperTune
A fully distributed hyperparameter optimization tool for PyTorch DNNs
Language: Python - Size: 6.48 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

EunjuYang/DistributedPyTorch
Example of Distributed pyTorch
Language: Python - Size: 6.84 KB - Last synced at: over 2 years ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0
