distributed-training | Topic | Ecosyste.ms: Repos

Topic: "distributed-training"

Hunterdii/TensorFlow-Advanced-Techniques-Solution

Tensorflow Advanced Technique Specialization - Solution

Language: Jupyter Notebook - Size: 39 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 8 - Forks: 4

aws-samples/end-2-end-3d-ml

This repository features Amazon SageMaker Ground Truth and explains how to ingest raw 3D point cloud data, label it, train a 3D object detection model using Amazon SageMaker, and deploy the model to an Amazon SageMaker Endpoint

Language: Jupyter Notebook - Size: 20.9 MB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 8 - Forks: 3

PinJhih/ddp-trainer

A simple package for distributed model training using Distributed Data Parallel (DDP) in PyTorch.

Language: Python - Size: 14.6 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 7 - Forks: 0

prabhatkc/ct-recon

Python Implementation of Forward & Inverse models for biomedical imaging

Language: Python - Size: 374 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 7 - Forks: 0

rosinality/meshfn

Framework for Human Alignment Learning

Language: Python - Size: 67.4 KB - Last synced at: 11 days ago - Pushed at: almost 2 years ago - Stars: 7 - Forks: 1

Shank2358/DCNv2

DCNv2_torch1.11

Language: Python - Size: 34.2 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 0

Shenggan/DeepCell-Keras 📦

Reimplement Deep Cell with Keras and Horovod.

Language: Python - Size: 2.38 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 6

bryanlimy/tf2-cyclegan

TensorFlow 2 implementation of CycleGAN with multi-GPU training.

Language: Python - Size: 16.1 MB - Last synced at: 2 days ago - Pushed at: almost 4 years ago - Stars: 7 - Forks: 4

cake-lab/Sync-Switch

The official repo for Sync-Switch (ICDCS'21)

Language: Python - Size: 4.82 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 7 - Forks: 2

aws-samples/sagemaker-distributed-training-digital-pathology-images

Distributed training of digital pathology tissue slide images using SageMaker and Horovod.

Language: Jupyter Notebook - Size: 279 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 7 - Forks: 4

saforem2/mmm

Multi-Modal Modeling

Language: Python - Size: 271 KB - Last synced at: 6 days ago - Pushed at: about 2 months ago - Stars: 6 - Forks: 0

jiankaiwang/distributed_training

This repository is a tutorial targeting how to train a deep neural network model in a higher efficient way. In this repository, we focus on two main frameworks that are Keras and Tensorflow.

Language: Jupyter Notebook - Size: 58.6 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 6 - Forks: 4

ReyRen/k8sMLer-client-go

基于kubernetes/client-go API，进行分布式训练GPU资源生命周期控制并支持多用户多任务训练日志实时通过websocket的连续重定向

Language: Go - Size: 729 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 0

asprenger/distributed-training-patterns

Experiments with low level communication patterns that are useful for distributed training.

Language: Python - Size: 8.79 KB - Last synced at: 6 months ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 0

kirschte/sbdt

S-BDT: Distributed Differentially Private Boosted Decision Trees

Language: C++ - Size: 23.6 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 4 - Forks: 2

alex-snd/TRecover

📜 A python library for distributed training of a Transformer neural network across the Internet to solve the Running Key Cipher, widely known in the field of Cryptography.

Language: Python - Size: 41.8 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 4 - Forks: 0

AierLab/pytorch-rpc-tutorial 📦

A hands-on tutorial to dive deep into PyTorch's RPC (Remote Procedure Call) framework. This repository offers a comprehensive guide developed with the assistance of OpenAI's ChatGPT. Whether you're a beginner or an advanced user, this tutorial will provide insights and practices to effectively use PyTorch RPC in your projects.

Language: Python - Size: 39.1 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 3

SunilGolden/RecEngineMF

Recommendation Engine powered by Matrix Factorization.

Language: Python - Size: 37.1 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 0

amoudgl/distributed-dtp

Distributed implementation of our proposed DTP algorithm parallelizing feedback weight training across GPUs (ICML 2022)

Language: Python - Size: 408 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 4 - Forks: 1

aws-samples/deepfm-tensorflow-distributed-training-on-amazon-sagemaker

In this demo, we show two samples about deepfm distributed training on Amazon SageMaker, one is based on Tensorflow Parameter Server on CPU and the other one is based on Horovod on GPU.

Language: Python - Size: 7.04 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 4 - Forks: 0

sdamadi/image-classification

Comprehensive image classification for training multilayer perceptron (MLP), LeNet, LeNet5, conv2, conv4, conv6, VGG11, VGG13, VGG16, VGG19 with batch normalization, ResNet18, ResNet34, ResNet50, MobilNetV2 on MNIST, CIFAR10, CIFAR100, and ImageNet1K.

Language: Python - Size: 46.1 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 0

cake-lab/CM-DARE

CM-DARE is a measurement infrastructure for monitoring distributed training in Google Cloud (ICDCS'20)

Language: Jupyter Notebook - Size: 5.44 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 4 - Forks: 1

harinik05/LettucifyAI

MLOps Pipeline & fine-tuned deep learning model to classify between various food items 🍎🚀

Language: Jupyter Notebook - Size: 26.2 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 1

TARTRL/TARTRL

基于PyTorch的分布式强化学习框架

Language: Python - Size: 14.6 KB - Last synced at: 20 days ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 0

LER0ever/HPGO

Development of Project HPGO | Hybrid Parallelism Global Orchestration

Size: 5.29 MB - Last synced at: 11 days ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 0

erfannoury/cifar-tf

A simple model for image classification on the CIFAR datasets, demonstrating TF's new APIs in TF 1.4

Language: Python - Size: 11 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 0

simerusm/arceus

Train neural networks with Macbook clusters and get paid for it

Language: TypeScript - Size: 8.18 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

denpalrius/bft-federated-learning

Federated Learning with Byzantine Fault Tolerance

Language: Jupyter Notebook - Size: 11.1 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

AmanPriyanshu/FL-Interactive-Game

FL-Interactive-Game: Interactive web game that teaches basic components of Federated Learning

Language: HTML - Size: 18 MB - Last synced at: 6 days ago - Pushed at: 6 months ago - Stars: 2 - Forks: 0

yoniLc/AdaCons

Sample Implementation of the paper "Adaptive Consensus Gradients Aggregation for Scaled Distributed Training".

Language: Python - Size: 19.5 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 2 - Forks: 0

lancelee82/necklace

Distributed deep learning framework based on pytorch/numba/nccl and zeromq.

Language: Python - Size: 235 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 1

nwangfw/nerf_ddp

Language: Jupyter Notebook - Size: 682 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

tlatkowski/u-net-tpu

Tensorflow implementation of U-Net model with TPU Estimator support.

Language: Python - Size: 159 KB - Last synced at: about 1 month ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 0

shikhar-srivastava/Meta-Iterative-MapReduce

Meta-Iterative Map-Reduce to perform Regression massively parallely on a cluster with MPI and CUDA for GPU and CPU-nodes support.

Language: Cuda - Size: 22.5 KB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 2 - Forks: 0

BjornMelin/deep-learning-evolution

🧠 Deep-Learning Evolution: Unified collection of TensorFlow & PyTorch projects, featuring custom CUDA kernels, distributed training, memory‑efficient methods, and production‑ready pipelines. Showcases advanced GPU optimizations, from foundational models to cutting‑edge architectures. 🚀

Size: 7.81 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

tolgatasci/ai-farm

AI-Farm is a distributed deep learning training framework that enables efficient model training across multiple machines. It provides a scalable infrastructure with real-time monitoring through a web admin panel, adaptive task distribution, and support for both CPU and GPU training.

Language: Python - Size: 95.7 KB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

sujaltv/ddpw

A lightweight wrapper that bootstraps PyTorch's Distributed (Data) Parallel.

Language: Python - Size: 10.4 MB - Last synced at: 20 days ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

walln/loadax

Dataloading for JAX

Language: Python - Size: 1.17 MB - Last synced at: 2 days ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

Sonny-Inkai/MACHINE-TRANSLATION-EN-HU

Machine Translation Model Training Distributed with python and jax.

Language: Jupyter Notebook - Size: 431 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

ChaosAdmStudent/DNN-Training-Acceleration

In this project, I implement and compare the different distributed training techniques from data parallelization and model parallelization from scratch using PyTorch

Language: Python - Size: 47.9 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

BinFuPKU/pytorch-practice

个人实现pytorch高级编程，包括基本知识、卷积神经网络、循环神经网络、生成对抗、模型部署和分布式训练（2022)

Language: Jupyter Notebook - Size: 279 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

Subrahmanyajoshi/Distributed-Training-with-TensorFlow

This repository shows how to distribute training of large machine learning models to make it faster.

Language: Jupyter Notebook - Size: 55.7 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

valayDave/metaflow-kube-demo

Metaflow On Kubernetes

Language: Jupyter Notebook - Size: 465 KB - Last synced at: about 2 months ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 1

medtune/k8s-tf

Messing with Distributed TensorFlow and Kubernetes

Language: Python - Size: 42 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

SauravMaheshkar/distgym

simulated distributed training

Language: Python - Size: 0 Bytes - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

okita871/determined

Build customized JSON and HCL Unmarshaler with Determined hcl, json

Language: Go - Size: 119 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

HPDC25-SAFusion/SAFusion

SAFusion: Efficient Tensor Fusion with Sparsification Ahead for High-Performance Distributed DNN Training

Language: Python - Size: 1.19 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

RohanMenon/LipShiFT

This repo contains code to reproduce results for LipShiFT.

Language: Python - Size: 43 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

michaelyliu6/transformers

Educational and Production ready implementations of GPT2

Language: Jupyter Notebook - Size: 17.8 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

seunboy1/Income-predictor

Quick intro into the world of distributed machine learning

Language: Jupyter Notebook - Size: 5.03 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Muhammad-Shah/tensorflow-advance-techniques-and-projects-specializtions

Access programming assignments and labs from the TensorFlow Advanced Techniques and TensorFlow Developer Specializations by deeplearning.ai on Coursera. 🚀🧠

Language: Jupyter Notebook - Size: 37.9 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

kirschte/dphelmet

Distributed DP-Helmet: Scalable Differentially Private Non-interactive Averaging of Single Layers

Language: Python - Size: 32.2 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

Hz188/experiments

Everything is born from a simple experiment.

Language: Python - Size: 18.2 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

CactusQ/horovod_distributed_training

Distributed training of a CNN using MNIST dataset, Tensorflow and Horovod

Language: Python - Size: 645 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

transiteration/scaling-ml

A GitHub repository showcasing the implementation of AI scaling techniques and integration with MLflow for streamlined experiment tracking and management in machine learning workflows.

Language: Python - Size: 41 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

meongju0o0/DistMHAug

Official DGL Implementation of "Distributed Graph Data Augmentation Technique for Graph Neural Network". KSC 2023

Language: Python - Size: 279 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

arawxx/FSDP-Distributed-Training-of-ConvNextV2-on-CIFAR10

A script for training the ConvNextV2 on CIFAR10 dataset using the FSDP technique for a distributed training scheme.

Language: Python - Size: 9.77 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

pdefusco/Distributed_XGBoost_with_PySpark_CML

Project showcasing how to get started with Distributed XGBoost using PySpark in CML.

Language: Jupyter Notebook - Size: 955 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

john-fante/distributed_deep_learning_example

I tried to implement distributed deep learning on the fashion mnist dataset

Language: Jupyter Notebook - Size: 26.4 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

SuperbTUM/Faster-Distributed-Training

Faster large mini-batch distributed training w/o. squeezing devices

Language: Python - Size: 601 KB - Last synced at: 4 days ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 1

nauyan/Multi-GPU-Training-Tensorflow

Training Using Multiple GPUs

Language: Python - Size: 9.77 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 1

majd-alhafi/Distributed-Training-Tensorflow

Language: Jupyter Notebook - Size: 20.5 KB - Last synced at: 12 months ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

lukeconibear/intro_ml

Short course: Introduction to Machine Learning

Size: 7.34 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

Hemantr05/distributed_training

This project contains scripts/modules for distributed training

Language: Python - Size: 31.3 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

m-ali-awan/sagemaker-learning-brad

Well commented code for different types of training configurations

Language: Jupyter Notebook - Size: 31 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

sukumarh/distributed-training

Distributed training using PyTorch DDP & Suggestive resource allocation

Language: Jupyter Notebook - Size: 3.27 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

EunjuYang/DistributedPyTorch

Example of Distributed pyTorch

Language: Python - Size: 6.84 KB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

StefanoFioravanzo/dl-operator

General purpose Kubernetes operator for DL frameworks written in Python

Language: Python - Size: 11.7 KB - Last synced at: about 2 months ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

StefanoFioravanzo/mx-operator Fork of kubeflow/training-operator

Tools for ML/MXNet on Kubernetes. Rework of original tf-operator to support MXNet framework.

Language: Go - Size: 40.5 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0