An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: multi-gpu

lattice/quda

QUDA is a library for performing calculations in lattice QCD on GPUs.

Language: C++ - Size: 102 MB - Last synced at: about 11 hours ago - Pushed at: about 12 hours ago - Stars: 309 - Forks: 108

tensordiffeq/TensorDiffEq

Efficient and Scalable Physics-Informed Deep Learning and Scientific Machine Learning on top of Tensorflow for multi-worker distributed computing

Language: Python - Size: 1.28 MB - Last synced at: about 15 hours ago - Pushed at: about 3 years ago - Stars: 113 - Forks: 42

Shamrock-code/Shamrock

The Shamrock Framework, an open-source, multi-GPU hydrodynamics framework for astrophysics. Scales seamlessly from laptops to exascale supercomputers, supporting SPH, AMR, and more.

Language: C++ - Size: 12.8 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 18 - Forks: 5

helmholtz-analytics/heat

Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python

Language: Python - Size: 21 MB - Last synced at: about 16 hours ago - Pushed at: about 17 hours ago - Stars: 219 - Forks: 53

projectchrono/DEM-Engine

A dual-GPU DEM solver with complex grain geometry support

Language: C++ - Size: 27.6 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 82 - Forks: 17

18520339/ml-distributed-training

Reduce the training time of CNNs by leveraging the power of multiple GPUs in 2 approaches, Multi-workers & Parameter Sever Training using TensorFlow 2

Language: Jupyter Notebook - Size: 8.05 MB - Last synced at: 8 days ago - Pushed at: over 2 years ago - Stars: 13 - Forks: 3

FZJ-JSC/tutorial-multi-gpu

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

Language: Cuda - Size: 197 MB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 253 - Forks: 55

ConfettiFX/The-Forge

The Forge Cross-Platform Framework PC Windows, Steamdeck (native), Ray Tracing, macOS / iOS, Android, XBOX, PS4, PS5, Switch, Quest 2

Language: C++ - Size: 2.31 GB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 5,049 - Forks: 528

omlins/ParallelStencil.jl

Package for writing high-level code for parallel high-performance stencil computations that can be deployed on both GPUs and CPUs

Language: Julia - Size: 40.9 MB - Last synced at: 11 days ago - Pushed at: 21 days ago - Stars: 338 - Forks: 38

darius513/MG-alphaGCD

Repository for ICS'25: MG-๐›ผGCD: Accelerating Graph Community Detection on Multi-GPU Platforms

Language: Cuda - Size: 40 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

AnimaVR/NeuroSync_Trainer_Lite

A multi GPU audio2face blendshape AI model trainer for your iPhone ARKit data.

Language: Python - Size: 7.43 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 19 - Forks: 9

rickiepark/deep-learning-with-python-2nd

<์ผ€๋ผ์Šค ์ฐฝ์‹œ์ž์—๊ฒŒ ๋ฐฐ์šฐ๋Š” ๋”ฅ๋Ÿฌ๋‹ 2ํŒ> ๋„์„œ์˜ ์ฝ”๋“œ ์ €์žฅ์†Œ

Language: Jupyter Notebook - Size: 48.4 MB - Last synced at: 13 days ago - Pushed at: about 1 year ago - Stars: 73 - Forks: 94

NeuralAditya/Neural_Network_C

Neural Network C is an advanced neural network implementation in pure C, optimized for high performance on CPUs and NVIDIA GPUs.

Language: C - Size: 64.5 KB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 0 - Forks: 0

NickLucche/stable-diffusion-nvidia-docker

GPU-ready Dockerfile to run Stability.AI stable-diffusion model v2 with a simple web interface. Includes multi-GPUs support.

Language: Python - Size: 7.68 MB - Last synced at: 17 days ago - Pushed at: 10 months ago - Stars: 366 - Forks: 44

predsci/POT3D

POT3D: High Performance Potential Field Solver

Language: Fortran - Size: 23.9 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 45 - Forks: 25

celerity/celerity-runtime

High-level C++ for Accelerator Clusters

Language: C++ - Size: 9.54 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 146 - Forks: 20

NVIDIA/OpenSeq2Seq ๐Ÿ“ฆ

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP

Language: Python - Size: 57.4 MB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 1,558 - Forks: 369

rbbrdckybk/dream-factory

Multi-threaded GUI manager for mass creation of AI-generated art with support for multiple GPUs.

Language: Python - Size: 21.4 MB - Last synced at: 30 days ago - Pushed at: 9 months ago - Stars: 498 - Forks: 56

guotong1988/BERT-pre-training

multi-gpu pre-training in one machine for BERT without horovod (Data Parallelism)

Language: Python - Size: 201 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 172 - Forks: 54

seasonSH/DocFace

Face recognition system for ID photos

Language: Python - Size: 650 KB - Last synced at: 17 days ago - Pushed at: over 6 years ago - Stars: 374 - Forks: 124

bharatsingh430/py-R-FCN-multiGPU

Code for training py-faster-rcnn and py-R-FCN on multiple GPUs in caffe

Language: Jupyter Notebook - Size: 8.82 MB - Last synced at: 23 days ago - Pushed at: almost 8 years ago - Stars: 193 - Forks: 96

lupantech/dual-mfa-vqa

Co-attending Regions and Detections for VQA.

Language: Matlab - Size: 1.44 MB - Last synced at: 13 days ago - Pushed at: almost 7 years ago - Stars: 40 - Forks: 14

if0ne/multi-gpu-shadows

Experimental prototype for using multi-gpu in real-time rendering in particular for rendering cascading shadow maps

Language: Rust - Size: 37.8 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

predsci/multigpu-test-code

This code mimics the basic MPI+OpenACC tasks of PSI's MAS Solar MHD code, for use with testing multi-GPU multi-node clusters

Language: Fortran - Size: 36.1 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 3 - Forks: 0

andreped/GradientAccumulator

:dart: Accumulated Gradients for TensorFlow 2

Language: Python - Size: 5.3 MB - Last synced at: 1 day ago - Pushed at: about 1 year ago - Stars: 53 - Forks: 11

p-anastas/PARALiA-GEMMex

Language: C++ - Size: 1.58 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

miguelcarcamov/gpuvmem

GPU Framework for Radio Astronomical Image Synthesis

Language: Cuda - Size: 503 MB - Last synced at: 1 day ago - Pushed at: 5 months ago - Stars: 28 - Forks: 3

papuSpartan/stable-diffusion-webui-distributed

Chains stable-diffusion-webui instances together to facilitate faster image generation.

Language: Python - Size: 514 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 178 - Forks: 13

v-iashin/video_features

Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and TIMM models.

Language: Python - Size: 282 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 497 - Forks: 94

kentaroy47/pytorch-mgpu-cifar10

testing multi gpu for pytorch

Language: Python - Size: 14.6 KB - Last synced at: 19 days ago - Pushed at: almost 6 years ago - Stars: 26 - Forks: 9

eth-cscs/ImplicitGlobalGrid.jl

Almost trivial distributed parallelization of stencil-based GPU and CPU applications on a regular staggered grid

Language: Julia - Size: 6.81 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 152 - Forks: 16

qcrbellor/CUDA-Q-Workshop

Hands-on workshop CUDA-Q NVIDIA in RWTH Aachen University & Technische Universitรคt Berlin, June 2024.

Language: Jupyter Notebook - Size: 16 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

tugrul512bit/libGPGPU

Multi-GPU & CPU OpenCL kernel executor with load-balancing as if there is one big GPU.

Language: C++ - Size: 2.09 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 9 - Forks: 2

ZJCV/facenet

[CVPR 2015] FaceNet: A Unified Embedding for Face Recognition and Clustering

Language: Python - Size: 74.2 KB - Last synced at: 8 days ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 1

ParCoreLab/CPU-Free-model

Source code for the CPU-Free model - a fully autonomous execution model for multi-GPU applications that completely excludes the involvement of the CPU beyond the initial kernel launch.

Language: Cuda - Size: 18.2 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 14 - Forks: 3

YukeWang96/MGG_OSDI23

Artifact for OSDI'23: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Multi-GPU Platforms.

Language: Cuda - Size: 1.1 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 28 - Forks: 3

dmarnerides/dlt

Deep Learning Toolbox for Torch

Language: Lua - Size: 60.5 KB - Last synced at: 5 months ago - Pushed at: over 7 years ago - Stars: 21 - Forks: 2

CS-406-Parallel-Programming/Sparse-Matrix-Cycle-Count

Sabanci University CS406 Group Project Parallel Computing Cycle Count of length k in Sparse Matrix

Language: Cuda - Size: 4.91 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

GPUSPH/gpusph

The world's first CUDA implementation of Weakly-Compressible Smoothed Particle Hydrodynamics

Language: C++ - Size: 9.21 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 148 - Forks: 64

tamerthamoqa/facenet-pytorch-glint360k

A PyTorch implementation of the 'FaceNet' paper for training a facial recognition model with Triplet Loss using the glint360k dataset. A pre-trained model using Triplet Loss is available for download.

Language: Python - Size: 24.7 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 208 - Forks: 56

tugrul512bit/FastSimpleNeuralNetworkTrainer

Gpu accelerated neural network trainer that supports multiple GPUs with OpenCL.

Language: C++ - Size: 528 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

enfiskutensykkel/multi-gpu-bwtest

Measure bandwidth of multiple simultaneously started cudaMemcpyAsync

Language: Cuda - Size: 21.5 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 7 - Forks: 3

ace19-dev/image-retrieval-tf

image retrieval with cosine metric learning

Language: Python - Size: 18.4 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 7 - Forks: 1

lebedov/cudamps

Python interface to CUDA Multi-Process Service

Language: Python - Size: 17.6 KB - Last synced at: 8 days ago - Pushed at: about 9 years ago - Stars: 7 - Forks: 2

madcato/pytorch-word2vec

word2vec implementation using PyTroch

Language: Python - Size: 16.6 KB - Last synced at: 24 days ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0

RnoldR/multi_gpu

Testing speed of two GPU's vs. one GPU

Language: Python - Size: 783 KB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 0

previsionio/damavand

Damavand is a quantum circuit simulator. It can run on laptops or High Performance Computing architectures, such CPU distributed architectures or multi GPU distributed architectures.

Language: Rust - Size: 12.9 MB - Last synced at: 3 days ago - Pushed at: over 3 years ago - Stars: 8 - Forks: 4

hfxunlp/transformer ๐Ÿ“ฆ

Neutron: A pytorch based implementation of Transformer and its variants.

Language: Python - Size: 887 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 63 - Forks: 11

cywjava/chatglm-6b-fine-tuning

chatglm-6b-fine-tuning

Language: Python - Size: 77.9 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 66 - Forks: 8

JiahongChen/multiGPU

Test code for running PyTorch deep learning models using multiple GPUs.

Language: Python - Size: 29.3 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 2

tugrul512bit/Cekirdekler

Multi-device OpenCL kernel load balancer and pipeliner API for C#. Uses shared-distributed memory model to keep GPUs updated fast while using same kernel on all devices(for simplicity).

Language: C# - Size: 10.6 MB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 86 - Forks: 9

farkhor/WS-VR

A CUDA-based multi-GPU vertex-centric graph processing framework based on Warp Segmentation and Vertex Refinement techniques.

Language: Cuda - Size: 26.4 KB - Last synced at: over 1 year ago - Pushed at: about 8 years ago - Stars: 10 - Forks: 2

tamerthamoqa/CheXpert-multilabel-classification-tensorflow

Code repository for training multi-label classification models on the CheXpert Chest X-ray dataset.

Language: Python - Size: 23.9 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 2

jimth001/my-tf-framework-for-nlp-tasks

This project aims to help people implement tensorflow model pipelines quickly for different nlp tasks.

Language: Python - Size: 83 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 5 - Forks: 0

Erfan-Ahmadi/TheForgeExamples

Graphic Techniques Implemented on The Forge API, a cross-platform rendering framework on top of Vulkan, DirectX, Metal

Language: C++ - Size: 155 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 16 - Forks: 3

Project-MANAS/tfutils ๐Ÿ“ฆ

Utilities for making TensorFlow easier

Language: Python - Size: 17.6 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 1

muriloboratto/MC-SD03-II

2023 Summer Program - Santos Dumont School

Language: Jupyter Notebook - Size: 62.1 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

kuixu/keras_multi_gpu

Multi-GPU training for Keras

Language: Python - Size: 286 KB - Last synced at: 16 days ago - Pushed at: almost 8 years ago - Stars: 44 - Forks: 22

tamerthamoqa/3D-mri-brain-tumour-image-segmentation-medical-decathlon-tensorflow

Code repository for training a brain tumour U-Net 3D image segmentation model using the 'Task1 Brain Tumour' medical segmentation decathlon challenge dataset.

Language: Python - Size: 19.2 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 3

tugrul512bit/gpgpu-loadbalancerx

Simple load-balancing library for balancing GPGPU workloads between a GPU and a CPU or any number of devices in a computer or multiple computers.

Language: C++ - Size: 680 KB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 1

jigangkim/nvidia-gpu-scheduler

NVIDIA GPU compute task scheduling utility

Language: Python - Size: 430 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 1

noxouille/nvamdbench

Benchmark for both NVIDIA and AMD GPU

Language: Python - Size: 519 KB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 4

hongshibao/kubernetes Fork of kubernetes/kubernetes

A fork of Kubernetes with support of schedulable resource of NVIDIA GPU memory

Language: Go - Size: 554 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 0

zabir-nabil/darknet-multi-gpu-parallel

running multiple darknet models in parallel in multi-gpu setup

Language: Python - Size: 11.7 KB - Last synced at: 20 days ago - Pushed at: almost 5 years ago - Stars: 6 - Forks: 1

QuanLab/nvidia-cuda-docker

Language: Dockerfile - Size: 1000 Bytes - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

olk/mnist-performance

performance test of MNIST hand writings usign MXNet + TF

Language: Python - Size: 22.5 KB - Last synced at: 27 days ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

kuixu/frustum-pointnets Fork of charlesq34/frustum-pointnets

Frustum PointNets for 3D Object Detection from RGB-D Data

Language: Python - Size: 3.98 MB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 3 - Forks: 0

visionscaper/stateful_multi_gpu

Experimental utility to build stateful RNN models for multi GPU training.

Language: Python - Size: 5.86 KB - Last synced at: 3 days ago - Pushed at: about 7 years ago - Stars: 4 - Forks: 1

fengwang/cunn

Deep neural network with multi-GPU support in a minimal fashion

Language: C++ - Size: 48.8 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

naviocean/keras_experiments Fork of avolkov1/keras_experiments

Experimental Keras libraries and examples.

Language: Python - Size: 670 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

ZhijianChan/tf_face

Language: Python - Size: 365 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 2 - Forks: 3

mebegu/MVAPICH2-Benchmarks Fork of cangumeli/mvapich-tests

Benchmarks for Multi-GPU Communication with MVAPICH2

Language: C - Size: 6.84 KB - Last synced at: about 2 years ago - Pushed at: over 8 years ago - Stars: 0 - Forks: 0

farrajota/multi-gpu-torchnet

Train an object classifier using multiple gpus in Torch7

Language: Lua - Size: 24.4 KB - Last synced at: about 1 month ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0