data-parallelism | Topic | Ecosyste.ms: Repos

Topic: "data-parallelism"

hpcaitech/ColossalAI

Making large AI models cheaper, faster and more accessible

Language: Python - Size: 63.1 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 40,931 - Forks: 4,522

deepspeedai/DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language: Python - Size: 217 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 38,691 - Forks: 4,405

cerndb/dist-keras 📦

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

Language: Python - Size: 54.6 MB - Last synced at: 22 days ago - Pushed at: almost 7 years ago - Stars: 623 - Forks: 167

mratsim/weave

A state-of-the-art multithreading runtime: message-passing based, fast, scalable, ultra-low overhead

Language: Nim - Size: 8.58 MB - Last synced at: 3 days ago - Pushed at: 12 months ago - Stars: 558 - Forks: 22

PaddlePaddle/PaddleFleetX

飞桨大模型开发套件，提供大语言模型、跨模态大模型、生物计算大模型等领域的全流程开发工具链。

Language: Python - Size: 637 MB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 467 - Forks: 165

Oneflow-Inc/libai

LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training

Language: Python - Size: 34.7 MB - Last synced at: 18 days ago - Pushed at: 25 days ago - Stars: 403 - Forks: 56

alibaba/EasyParallelLibrary

Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.

Language: Python - Size: 771 KB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 267 - Forks: 49

dkeras-project/dkeras

Distributed Keras Engine, Make Keras faster with only one line of code.

Language: Python - Size: 6.48 MB - Last synced at: about 1 month ago - Pushed at: over 5 years ago - Stars: 188 - Forks: 12

wenwei202/terngrad

Ternary Gradients to Reduce Communication in Distributed Deep Learning (TensorFlow)

Language: Python - Size: 5.59 MB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 181 - Forks: 48

vertexclique/orkhon

Orkhon: ML Inference Framework and Server Runtime

Language: Rust - Size: 26.2 MB - Last synced at: 21 days ago - Pushed at: over 4 years ago - Stars: 149 - Forks: 4

xrsrke/pipegoose

Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*

Language: Python - Size: 1.26 MB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 82 - Forks: 18

hkproj/pytorch-transformer-distributed

Distributed training (multi-node) of a Transformer model

Language: Python - Size: 4.03 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 66 - Forks: 29

NERSC/sc23-dl-tutorial

SC23 Deep Learning at Scale Tutorial Material

Language: Python - Size: 15.7 MB - Last synced at: 28 days ago - Pushed at: 9 months ago - Stars: 44 - Forks: 9

kuixu/keras_multi_gpu

Multi-GPU training for Keras

Language: Python - Size: 286 KB - Last synced at: 2 months ago - Pushed at: almost 8 years ago - Stars: 44 - Forks: 22

NERSC/dl-at-scale-training

Deep Learning at Scale Training Event at NERSC

Language: Python - Size: 17.3 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 19 - Forks: 12

ryantd/veloce

WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.

Language: Python - Size: 9.13 MB - Last synced at: 26 days ago - Pushed at: almost 3 years ago - Stars: 18 - Forks: 0

tcoppex/cpu-gbfilter

:hotsprings: Optimized Gaussian blur filter on CPU.

Language: C++ - Size: 21.5 KB - Last synced at: about 2 months ago - Pushed at: over 7 years ago - Stars: 17 - Forks: 1

daekeun-ml/sm-distributed-training-step-by-step

This repository provides hands-on labs on PyTorch-based Distributed Training and SageMaker Distributed Training. It is written to make it easy for beginners to get started, and guides you through step-by-step modifications to the code based on the most basic BERT use cases.

Language: Jupyter Notebook - Size: 1.3 MB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 13 - Forks: 2

AlibabaPAI/FlashModels

Fast and easy distributed model training examples.

Language: Python - Size: 42.9 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 9 - Forks: 4

plerros/helsing

A mostly POSIX-compliant utility that scans a given interval for vampire numbers.

Language: C - Size: 358 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 4 - Forks: 1

itzmeanjan/merklize-blake3 📦

OpenCL powered Merklization using BLAKE3

Language: C - Size: 76.2 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 0

Oblomov/cldpp

OpenCL Data Parallel Primitives

Language: C - Size: 62.5 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 0

zbjob/DiscoPoP

Dependence-Based Code Transformation for Coarse-Grained Parallelism

Language: C++ - Size: 18.6 KB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 4 - Forks: 1

sjlee25/batch-partitioning

Batch Partitioning for Multi-PE Inference with TVM (2020)

Language: Python - Size: 3.79 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

LER0ever/HPGO

Development of Project HPGO | Hybrid Parallelism Global Orchestration

Size: 5.29 MB - Last synced at: 10 days ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 0

dscpesu/NetTorrent

A decentralized and distributed framework for training DNNs

Language: Python - Size: 9.92 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 3 - Forks: 0

MurrellGroup/Conflux.jl 📦

Single-node data parallelism in Julia with CUDA

Language: Julia - Size: 31.3 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 2 - Forks: 0

zjc664656505/LinguaLinked

Distributed-Parallelism over Heterogeneous Devices

Language: Python - Size: 302 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

axr6077/Ray-Trace-Parallelization

Complex ray tracing algorithm optimized by using parallelization over different partitioning schemes and explore the performance gains through grain size and processing units (parameters) over sequential algorithm to render a high resolution image.

Language: C++ - Size: 4.76 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

AnveshaM/Enhancing-performance-of-big-data-machine-learning-models-on-Google-Cloud-Platform

The project is focused on parallelising pre-processing, measuring and machine learning in the cloud, as well as the evaluation and analysis of the cloud performance.

Language: Jupyter Notebook - Size: 8.88 MB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 1

HiEST/DistMIS Fork of oriolaranda/DistMIS

Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation

Language: Python - Size: 8.16 MB - Last synced at: 21 days ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 1

thomas-bouvier/distributed-continual-learning

Towards Rehearsal-based Continual Learning at Scale: distributed CL using Horovod + PyTorch on up to 128 GPUs

Language: Python - Size: 930 KB - Last synced at: 2 days ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

diptorupd/numba-dpex Fork of IntelPython/numba-dpex

A SYCL-like kernel compiler for Python

Language: Python - Size: 10.7 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

ashayp22/monte-carlo-options-simd

SIMD multithreaded Monte Carlo options pricer in Rust 🦀

Language: Rust - Size: 77.1 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

oekosheri/pytorch_unet_scaling

Scaling Unet in Pytorch

Language: Jupyter Notebook - Size: 1.63 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

Sujith013/Binary-Classification-using-Machine-Learning-and-Data-parallelism

Binary data classification using TensorFlow and Keras in python and achieving data parallelism using MPI

Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: 9 months ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

axr6077/Hogdkin-Huxley-Neuron-Model

Sequential and Parallel Implementation of the Hodgkin-Huxley Neuron model.

Language: C - Size: 3.91 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

oriolaranda/DistMIS

Official Repository for the paper: Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation

Language: Python - Size: 8.14 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 3

soham-b-github/K8sCIFAR

Distributed Data parallelism on CIFAR-10 using Kubernetes

Language: Python - Size: 2.44 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

infinitygabri/beginner-code-lab

# Beginner Code Lab **Beginner Code Lab** is a multi-language coding playground for those starting their coding journey. 🐙 Dive into web development, backend programming, or mobile app creation and enjoy hands-on practice in a supportive environment. 🌱

Language: TypeScript - Size: 290 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0