Topic: "distributed-training"
Hunterdii/TensorFlow-Advanced-Techniques-Solution
Tensorflow Advanced Technique Specialization - Solution
Language: Jupyter Notebook - Size: 39 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 8 - Forks: 4

aws-samples/end-2-end-3d-ml
This repository features Amazon SageMaker Ground Truth and explains how to ingest raw 3D point cloud data, label it, train a 3D object detection model using Amazon SageMaker, and deploy the model to an Amazon SageMaker Endpoint
Language: Jupyter Notebook - Size: 20.9 MB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 8 - Forks: 3

PinJhih/ddp-trainer
A simple package for distributed model training using Distributed Data Parallel (DDP) in PyTorch.
Language: Python - Size: 14.6 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 7 - Forks: 0

prabhatkc/ct-recon
Python Implementation of Forward & Inverse models for biomedical imaging
Language: Python - Size: 374 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 7 - Forks: 0

rosinality/meshfn
Framework for Human Alignment Learning
Language: Python - Size: 67.4 KB - Last synced at: 11 days ago - Pushed at: almost 2 years ago - Stars: 7 - Forks: 1

Shank2358/DCNv2
DCNv2_torch1.11
Language: Python - Size: 34.2 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 0

Shenggan/DeepCell-Keras 📦
Reimplement Deep Cell with Keras and Horovod.
Language: Python - Size: 2.38 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 6

bryanlimy/tf2-cyclegan
TensorFlow 2 implementation of CycleGAN with multi-GPU training.
Language: Python - Size: 16.1 MB - Last synced at: 2 days ago - Pushed at: almost 4 years ago - Stars: 7 - Forks: 4

cake-lab/Sync-Switch
The official repo for Sync-Switch (ICDCS'21)
Language: Python - Size: 4.82 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 7 - Forks: 2

aws-samples/sagemaker-distributed-training-digital-pathology-images
Distributed training of digital pathology tissue slide images using SageMaker and Horovod.
Language: Jupyter Notebook - Size: 279 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 7 - Forks: 4

saforem2/mmm
Multi-Modal Modeling
Language: Python - Size: 271 KB - Last synced at: 6 days ago - Pushed at: about 2 months ago - Stars: 6 - Forks: 0

jiankaiwang/distributed_training
This repository is a tutorial targeting how to train a deep neural network model in a higher efficient way. In this repository, we focus on two main frameworks that are Keras and Tensorflow.
Language: Jupyter Notebook - Size: 58.6 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 6 - Forks: 4

ReyRen/k8sMLer-client-go
基于kubernetes/client-go API, 进行分布式训练GPU资源生命周期控制并支持多用户多任务训练日志实时通过websocket的连续重定向
Language: Go - Size: 729 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 0

asprenger/distributed-training-patterns
Experiments with low level communication patterns that are useful for distributed training.
Language: Python - Size: 8.79 KB - Last synced at: 6 months ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 0

kirschte/sbdt
S-BDT: Distributed Differentially Private Boosted Decision Trees
Language: C++ - Size: 23.6 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 4 - Forks: 2

alex-snd/TRecover
📜 A python library for distributed training of a Transformer neural network across the Internet to solve the Running Key Cipher, widely known in the field of Cryptography.
Language: Python - Size: 41.8 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 4 - Forks: 0

AierLab/pytorch-rpc-tutorial 📦
A hands-on tutorial to dive deep into PyTorch's RPC (Remote Procedure Call) framework. This repository offers a comprehensive guide developed with the assistance of OpenAI's ChatGPT. Whether you're a beginner or an advanced user, this tutorial will provide insights and practices to effectively use PyTorch RPC in your projects.
Language: Python - Size: 39.1 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 3

SunilGolden/RecEngineMF
Recommendation Engine powered by Matrix Factorization.
Language: Python - Size: 37.1 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 0

amoudgl/distributed-dtp
Distributed implementation of our proposed DTP algorithm parallelizing feedback weight training across GPUs (ICML 2022)
Language: Python - Size: 408 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 4 - Forks: 1

aws-samples/deepfm-tensorflow-distributed-training-on-amazon-sagemaker
In this demo, we show two samples about deepfm distributed training on Amazon SageMaker, one is based on Tensorflow Parameter Server on CPU and the other one is based on Horovod on GPU.
Language: Python - Size: 7.04 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 4 - Forks: 0

sdamadi/image-classification
Comprehensive image classification for training multilayer perceptron (MLP), LeNet, LeNet5, conv2, conv4, conv6, VGG11, VGG13, VGG16, VGG19 with batch normalization, ResNet18, ResNet34, ResNet50, MobilNetV2 on MNIST, CIFAR10, CIFAR100, and ImageNet1K.
Language: Python - Size: 46.1 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 0

cake-lab/CM-DARE
CM-DARE is a measurement infrastructure for monitoring distributed training in Google Cloud (ICDCS'20)
Language: Jupyter Notebook - Size: 5.44 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 4 - Forks: 1

harinik05/LettucifyAI
MLOps Pipeline & fine-tuned deep learning model to classify between various food items 🍎🚀
Language: Jupyter Notebook - Size: 26.2 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 1

TARTRL/TARTRL
基于PyTorch的分布式强化学习框架
Language: Python - Size: 14.6 KB - Last synced at: 20 days ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 0

LER0ever/HPGO
Development of Project HPGO | Hybrid Parallelism Global Orchestration
Size: 5.29 MB - Last synced at: 11 days ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 0

erfannoury/cifar-tf
A simple model for image classification on the CIFAR datasets, demonstrating TF's new APIs in TF 1.4
Language: Python - Size: 11 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 0

simerusm/arceus
Train neural networks with Macbook clusters and get paid for it
Language: TypeScript - Size: 8.18 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

denpalrius/bft-federated-learning
Federated Learning with Byzantine Fault Tolerance
Language: Jupyter Notebook - Size: 11.1 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

AmanPriyanshu/FL-Interactive-Game
FL-Interactive-Game: Interactive web game that teaches basic components of Federated Learning
Language: HTML - Size: 18 MB - Last synced at: 6 days ago - Pushed at: 6 months ago - Stars: 2 - Forks: 0

yoniLc/AdaCons
Sample Implementation of the paper "Adaptive Consensus Gradients Aggregation for Scaled Distributed Training".
Language: Python - Size: 19.5 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 2 - Forks: 0

lancelee82/necklace
Distributed deep learning framework based on pytorch/numba/nccl and zeromq.
Language: Python - Size: 235 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 1

nwangfw/nerf_ddp
Language: Jupyter Notebook - Size: 682 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

tlatkowski/u-net-tpu
Tensorflow implementation of U-Net model with TPU Estimator support.
Language: Python - Size: 159 KB - Last synced at: about 1 month ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 0

shikhar-srivastava/Meta-Iterative-MapReduce
Meta-Iterative Map-Reduce to perform Regression massively parallely on a cluster with MPI and CUDA for GPU and CPU-nodes support.
Language: Cuda - Size: 22.5 KB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 2 - Forks: 0

BjornMelin/deep-learning-evolution
🧠 Deep-Learning Evolution: Unified collection of TensorFlow & PyTorch projects, featuring custom CUDA kernels, distributed training, memory‑efficient methods, and production‑ready pipelines. Showcases advanced GPU optimizations, from foundational models to cutting‑edge architectures. 🚀
Size: 7.81 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

tolgatasci/ai-farm
AI-Farm is a distributed deep learning training framework that enables efficient model training across multiple machines. It provides a scalable infrastructure with real-time monitoring through a web admin panel, adaptive task distribution, and support for both CPU and GPU training.
Language: Python - Size: 95.7 KB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

sujaltv/ddpw
A lightweight wrapper that bootstraps PyTorch's Distributed (Data) Parallel.
Language: Python - Size: 10.4 MB - Last synced at: 20 days ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

walln/loadax
Dataloading for JAX
Language: Python - Size: 1.17 MB - Last synced at: 2 days ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

Sonny-Inkai/MACHINE-TRANSLATION-EN-HU
Machine Translation Model Training Distributed with python and jax.
Language: Jupyter Notebook - Size: 431 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

ChaosAdmStudent/DNN-Training-Acceleration
In this project, I implement and compare the different distributed training techniques from data parallelization and model parallelization from scratch using PyTorch
Language: Python - Size: 47.9 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

BinFuPKU/pytorch-practice
个人实现pytorch高级编程,包括基本知识、卷积神经网络、循环神经网络、生成对抗、模型部署和分布式训练(2022)
Language: Jupyter Notebook - Size: 279 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

Subrahmanyajoshi/Distributed-Training-with-TensorFlow
This repository shows how to distribute training of large machine learning models to make it faster.
Language: Jupyter Notebook - Size: 55.7 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

valayDave/metaflow-kube-demo
Metaflow On Kubernetes
Language: Jupyter Notebook - Size: 465 KB - Last synced at: about 2 months ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 1

medtune/k8s-tf
Messing with Distributed TensorFlow and Kubernetes
Language: Python - Size: 42 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

SauravMaheshkar/distgym
simulated distributed training
Language: Python - Size: 0 Bytes - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

okita871/determined
Build customized JSON and HCL Unmarshaler with Determined hcl, json
Language: Go - Size: 119 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

HPDC25-SAFusion/SAFusion
SAFusion: Efficient Tensor Fusion with Sparsification Ahead for High-Performance Distributed DNN Training
Language: Python - Size: 1.19 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

RohanMenon/LipShiFT
This repo contains code to reproduce results for LipShiFT.
Language: Python - Size: 43 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

michaelyliu6/transformers
Educational and Production ready implementations of GPT2
Language: Jupyter Notebook - Size: 17.8 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

seunboy1/Income-predictor
Quick intro into the world of distributed machine learning
Language: Jupyter Notebook - Size: 5.03 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Muhammad-Shah/tensorflow-advance-techniques-and-projects-specializtions
Access programming assignments and labs from the TensorFlow Advanced Techniques and TensorFlow Developer Specializations by deeplearning.ai on Coursera. 🚀🧠
Language: Jupyter Notebook - Size: 37.9 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

kirschte/dphelmet
Distributed DP-Helmet: Scalable Differentially Private Non-interactive Averaging of Single Layers
Language: Python - Size: 32.2 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

Hz188/experiments
Everything is born from a simple experiment.
Language: Python - Size: 18.2 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

CactusQ/horovod_distributed_training
Distributed training of a CNN using MNIST dataset, Tensorflow and Horovod
Language: Python - Size: 645 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

transiteration/scaling-ml
A GitHub repository showcasing the implementation of AI scaling techniques and integration with MLflow for streamlined experiment tracking and management in machine learning workflows.
Language: Python - Size: 41 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

meongju0o0/DistMHAug
Official DGL Implementation of "Distributed Graph Data Augmentation Technique for Graph Neural Network". KSC 2023
Language: Python - Size: 279 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

arawxx/FSDP-Distributed-Training-of-ConvNextV2-on-CIFAR10
A script for training the ConvNextV2 on CIFAR10 dataset using the FSDP technique for a distributed training scheme.
Language: Python - Size: 9.77 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

pdefusco/Distributed_XGBoost_with_PySpark_CML
Project showcasing how to get started with Distributed XGBoost using PySpark in CML.
Language: Jupyter Notebook - Size: 955 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

john-fante/distributed_deep_learning_example
I tried to implement distributed deep learning on the fashion mnist dataset
Language: Jupyter Notebook - Size: 26.4 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

SuperbTUM/Faster-Distributed-Training
Faster large mini-batch distributed training w/o. squeezing devices
Language: Python - Size: 601 KB - Last synced at: 4 days ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 1

nauyan/Multi-GPU-Training-Tensorflow
Training Using Multiple GPUs
Language: Python - Size: 9.77 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 1

majd-alhafi/Distributed-Training-Tensorflow
Language: Jupyter Notebook - Size: 20.5 KB - Last synced at: 12 months ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

lukeconibear/intro_ml
Short course: Introduction to Machine Learning
Size: 7.34 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

Hemantr05/distributed_training
This project contains scripts/modules for distributed training
Language: Python - Size: 31.3 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

m-ali-awan/sagemaker-learning-brad
Well commented code for different types of training configurations
Language: Jupyter Notebook - Size: 31 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

sukumarh/distributed-training
Distributed training using PyTorch DDP & Suggestive resource allocation
Language: Jupyter Notebook - Size: 3.27 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

EunjuYang/DistributedPyTorch
Example of Distributed pyTorch
Language: Python - Size: 6.84 KB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

StefanoFioravanzo/dl-operator
General purpose Kubernetes operator for DL frameworks written in Python
Language: Python - Size: 11.7 KB - Last synced at: about 2 months ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

StefanoFioravanzo/mx-operator Fork of kubeflow/training-operator
Tools for ML/MXNet on Kubernetes. Rework of original tf-operator to support MXNet framework.
Language: Go - Size: 40.5 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0
