An open API service providing repository metadata for many open source software ecosystems.

Topic: "data-parallelism"

hpcaitech/ColossalAI

Making large AI models cheaper, faster and more accessible

Language: Python - Size: 63.1 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 40,931 - Forks: 4,522

deepspeedai/DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language: Python - Size: 217 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 38,691 - Forks: 4,405

cerndb/dist-keras 📦

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

Language: Python - Size: 54.6 MB - Last synced at: 22 days ago - Pushed at: almost 7 years ago - Stars: 623 - Forks: 167

mratsim/weave

A state-of-the-art multithreading runtime: message-passing based, fast, scalable, ultra-low overhead

Language: Nim - Size: 8.58 MB - Last synced at: 3 days ago - Pushed at: 12 months ago - Stars: 558 - Forks: 22

PaddlePaddle/PaddleFleetX

飞桨大模型开发套件,提供大语言模型、跨模态大模型、生物计算大模型等领域的全流程开发工具链。

Language: Python - Size: 637 MB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 467 - Forks: 165

Oneflow-Inc/libai

LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training

Language: Python - Size: 34.7 MB - Last synced at: 18 days ago - Pushed at: 25 days ago - Stars: 403 - Forks: 56

alibaba/EasyParallelLibrary

Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.

Language: Python - Size: 771 KB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 267 - Forks: 49

dkeras-project/dkeras

Distributed Keras Engine, Make Keras faster with only one line of code.

Language: Python - Size: 6.48 MB - Last synced at: about 1 month ago - Pushed at: over 5 years ago - Stars: 188 - Forks: 12

wenwei202/terngrad

Ternary Gradients to Reduce Communication in Distributed Deep Learning (TensorFlow)

Language: Python - Size: 5.59 MB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 181 - Forks: 48

vertexclique/orkhon

Orkhon: ML Inference Framework and Server Runtime

Language: Rust - Size: 26.2 MB - Last synced at: 21 days ago - Pushed at: over 4 years ago - Stars: 149 - Forks: 4

xrsrke/pipegoose

Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*

Language: Python - Size: 1.26 MB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 82 - Forks: 18

hkproj/pytorch-transformer-distributed

Distributed training (multi-node) of a Transformer model

Language: Python - Size: 4.03 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 66 - Forks: 29

NERSC/sc23-dl-tutorial

SC23 Deep Learning at Scale Tutorial Material

Language: Python - Size: 15.7 MB - Last synced at: 28 days ago - Pushed at: 9 months ago - Stars: 44 - Forks: 9

kuixu/keras_multi_gpu

Multi-GPU training for Keras

Language: Python - Size: 286 KB - Last synced at: 2 months ago - Pushed at: almost 8 years ago - Stars: 44 - Forks: 22

NERSC/dl-at-scale-training

Deep Learning at Scale Training Event at NERSC

Language: Python - Size: 17.3 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 19 - Forks: 12

ryantd/veloce

WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.

Language: Python - Size: 9.13 MB - Last synced at: 26 days ago - Pushed at: almost 3 years ago - Stars: 18 - Forks: 0

tcoppex/cpu-gbfilter

:hotsprings: Optimized Gaussian blur filter on CPU.

Language: C++ - Size: 21.5 KB - Last synced at: about 2 months ago - Pushed at: over 7 years ago - Stars: 17 - Forks: 1

daekeun-ml/sm-distributed-training-step-by-step

This repository provides hands-on labs on PyTorch-based Distributed Training and SageMaker Distributed Training. It is written to make it easy for beginners to get started, and guides you through step-by-step modifications to the code based on the most basic BERT use cases.

Language: Jupyter Notebook - Size: 1.3 MB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 13 - Forks: 2

AlibabaPAI/FlashModels

Fast and easy distributed model training examples.

Language: Python - Size: 42.9 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 9 - Forks: 4

plerros/helsing

A mostly POSIX-compliant utility that scans a given interval for vampire numbers.

Language: C - Size: 358 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 4 - Forks: 1

itzmeanjan/merklize-blake3 📦

OpenCL powered Merklization using BLAKE3

Language: C - Size: 76.2 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 0

Oblomov/cldpp

OpenCL Data Parallel Primitives

Language: C - Size: 62.5 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 0

zbjob/DiscoPoP

Dependence-Based Code Transformation for Coarse-Grained Parallelism

Language: C++ - Size: 18.6 KB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 4 - Forks: 1

sjlee25/batch-partitioning

Batch Partitioning for Multi-PE Inference with TVM (2020)

Language: Python - Size: 3.79 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

LER0ever/HPGO

Development of Project HPGO | Hybrid Parallelism Global Orchestration

Size: 5.29 MB - Last synced at: 10 days ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 0

dscpesu/NetTorrent

A decentralized and distributed framework for training DNNs

Language: Python - Size: 9.92 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 3 - Forks: 0

MurrellGroup/Conflux.jl 📦

Single-node data parallelism in Julia with CUDA

Language: Julia - Size: 31.3 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 2 - Forks: 0

zjc664656505/LinguaLinked

Distributed-Parallelism over Heterogeneous Devices

Language: Python - Size: 302 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

axr6077/Ray-Trace-Parallelization

Complex ray tracing algorithm optimized by using parallelization over different partitioning schemes and explore the performance gains through grain size and processing units (parameters) over sequential algorithm to render a high resolution image.

Language: C++ - Size: 4.76 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

AnveshaM/Enhancing-performance-of-big-data-machine-learning-models-on-Google-Cloud-Platform

The project is focused on parallelising pre-processing, measuring and machine learning in the cloud, as well as the evaluation and analysis of the cloud performance.

Language: Jupyter Notebook - Size: 8.88 MB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 1

HiEST/DistMIS Fork of oriolaranda/DistMIS

Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation

Language: Python - Size: 8.16 MB - Last synced at: 21 days ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 1

thomas-bouvier/distributed-continual-learning

Towards Rehearsal-based Continual Learning at Scale: distributed CL using Horovod + PyTorch on up to 128 GPUs

Language: Python - Size: 930 KB - Last synced at: 2 days ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

diptorupd/numba-dpex Fork of IntelPython/numba-dpex

A SYCL-like kernel compiler for Python

Language: Python - Size: 10.7 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

ashayp22/monte-carlo-options-simd

SIMD multithreaded Monte Carlo options pricer in Rust 🦀

Language: Rust - Size: 77.1 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

oekosheri/pytorch_unet_scaling

Scaling Unet in Pytorch

Language: Jupyter Notebook - Size: 1.63 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

Sujith013/Binary-Classification-using-Machine-Learning-and-Data-parallelism

Binary data classification using TensorFlow and Keras in python and achieving data parallelism using MPI

Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: 9 months ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

axr6077/Hogdkin-Huxley-Neuron-Model

Sequential and Parallel Implementation of the Hodgkin-Huxley Neuron model.

Language: C - Size: 3.91 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

oriolaranda/DistMIS

Official Repository for the paper: Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation

Language: Python - Size: 8.14 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 3

soham-b-github/K8sCIFAR

Distributed Data parallelism on CIFAR-10 using Kubernetes

Language: Python - Size: 2.44 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

infinitygabri/beginner-code-lab

# Beginner Code Lab **Beginner Code Lab** is a multi-language coding playground for those starting their coding journey. 🐙 Dive into web development, backend programming, or mobile app creation and enjoy hands-on practice in a supportive environment. 🌱

Language: TypeScript - Size: 290 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

1set-t/ai-model

Industrial-grade weather visualization system that transforms AI model predictions into professional meteorological plots, emphasizing operational forecasting capabilities.

Size: 1.95 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

nikhilr612/safire

A small library for simulated annealing using arrayfire.

Language: Rust - Size: 36.1 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

TeamBipartite/csc485b-202409-a4

High throughput data-parallel GEMM implementations in Cuda using Cuda cores and Tensor cores

Language: C++ - Size: 834 KB - Last synced at: 8 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

UVA-MLSys/AI-for-Astronomy

A novel Cloud-based Astronomy framework for data parallel AI model inference on AWS

Language: Jupyter Notebook - Size: 200 MB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 4

oekosheri/tensorflow_unet_scaling

Scaling Unet in Tensorflow

Language: Jupyter Notebook - Size: 124 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ncl-teu/ncl_mapreducesim

MapReduceSimulator for Scheduling and Provisioning Algorithms

Language: Java - Size: 18.4 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

sre990/ske-pi

Data parallel and stream parallel skeletons implemented in erlang

Language: Erlang - Size: 274 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

t0re199/GPGPU_PROJECT

CUDA C parallel implementation of the Merge operation.

Language: Cuda - Size: 90.8 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

t0re199/GPGPU

CUDA C parallel implementations of some well-known algorithms.

Language: C - Size: 106 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

joelrorseth/HyperTune

A fully distributed hyperparameter optimization tool for PyTorch DNNs

Language: Python - Size: 6.48 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

EunjuYang/DistributedPyTorch

Example of Distributed pyTorch

Language: Python - Size: 6.84 KB - Last synced at: over 2 years ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

Related Topics
deep-learning 18 model-parallelism 16 pytorch 9 pipeline-parallelism 8 tensorflow 8 distributed-training 8 machine-learning 7 parallel-computing 6 distributed-computing 4 deep-neural-networks 4 python 4 distributed 4 keras 4 cuda 4 large-scale 4 rust 3 c 3 hpc 3 ray-tune 3 horovod 3 simd 2 opencl 2 cloud 2 keras-tensorflow 2 ray 2 mpi 2 mixture-of-experts 2 linux 2 distributed-systems 2 inference 2 gpu 2 zero 2 tensor-parallelism 2 gpu-parallelization 2 unet-image-segmentation 2 sequence-parallelism 2 mapreduce 2 parallel 2 parallelism 2 optimization-algorithms 2 self-supervised-learning 2 cuda-kernels 2 cuda-programming 2 data-parallel-computing 2 distributed-optimizers 2 data-science 2 gpipe 2 raylib 2 openmp 2 3d-unet 2 distributed-hyperparameter-tuning 2 experiment-parallelism 2 hyperparameter-tuning 2 medical-image-segmentation 2 vampire 1 zero-1 1 transformers 1 pyspark 1 rdd 1 3d-parallelism 1 huggingface-transformers 1 moe 1 large-scale-language-modeling 1 megatron 1 megatron-lm 1 nccl 1 julia 1 flux 1 vampire-number 1 blur 1 bmp-image 1 cache-efficiency 1 gaussian-blur 1 image-processing 1 multithreaded 1 distributed-deep-learning 1 distributed-keras-engine 1 keras-classification-models 1 keras-models 1 keras-neural-networks 1 neural-network 1 plaidml 1 tensorflow-models 1 ai-for-science 1 vision-transformers 1 collective-communication 1 distributed-data-parallel 1 gradient-accumulation 1 tutorial 1 cache 1 dataproc-clusters 1 google-cloud-ai-platform 1 google-cloud-platform 1 google-colaboratory 1 ml 1 nlp 1 rehearsal 1 billion-parameters 1 compression 1 trillion-parameters 1