An open API service providing repository metadata for many open source software ecosystems.

Topic: "gpu-computing"

catboost/catboost

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Language: C++ - Size: 1.67 GB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 8,438 - Forks: 1,229

gyroflow/gyroflow

Video stabilization using gyroscope data

Language: Rust - Size: 81.2 MB - Last synced at: 9 days ago - Pushed at: 19 days ago - Stars: 7,467 - Forks: 336

NVIDIA/thrust 📦

[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl

Language: C++ - Size: 17 MB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 4,973 - Forks: 763

google/tf-quant-finance

High-performance TensorFlow library for quantitative finance.

Language: Python - Size: 16.9 MB - Last synced at: 23 days ago - Pushed at: 3 months ago - Stars: 4,880 - Forks: 622

ProjectPhysX/FluidX3D

The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.

Language: C++ - Size: 21.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 4,464 - Forks: 389

tensorflow/lingvo

Lingvo

Language: Python - Size: 142 MB - Last synced at: about 8 hours ago - Pushed at: 2 days ago - Stars: 2,843 - Forks: 449

microsoft/pai 📦

Resource scheduling and cluster management for AI

Language: JavaScript - Size: 70.5 MB - Last synced at: 1 day ago - Pushed at: about 1 year ago - Stars: 2,665 - Forks: 548

KomputeProject/kompute

General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.

Language: C++ - Size: 25.3 MB - Last synced at: 6 days ago - Pushed at: 11 days ago - Stars: 2,235 - Forks: 172

jbush001/NyuziProcessor

GPGPU microprocessor architecture

Language: C - Size: 31.4 MB - Last synced at: 23 days ago - Pushed at: 8 months ago - Stars: 2,082 - Forks: 360

inducer/pycuda

CUDA integration for Python, plus shiny features

Language: Python - Size: 2.95 MB - Last synced at: 6 days ago - Pushed at: 12 days ago - Stars: 1,952 - Forks: 291

SciML/SciMLBook

Parallel Computing and Scientific Machine Learning (SciML): Methods and Applications (MIT 18.337J/6.338J)

Language: HTML - Size: 118 MB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 1,910 - Forks: 348

coreylowman/dfdx

Deep learning in Rust, with shape checked tensors and neural networks

Language: Rust - Size: 2.6 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 1,812 - Forks: 108

NVIDIA/cccl

CUDA Core Compute Libraries

Language: C++ - Size: 82.6 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,690 - Forks: 224

mikbry/awesome-webgpu

😎 Curated list of awesome things around WebGPU ecosystem.

Size: 99.6 KB - Last synced at: 10 days ago - Pushed at: 3 months ago - Stars: 1,657 - Forks: 70

AdaptiveCpp/AdaptiveCpp

Compiler for multiple programming models (SYCL, C++ standard parallelism, HIP/CUDA) for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications adapt themselves to all the hardware in the system - even at runtime!

Language: C++ - Size: 14.4 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1,654 - Forks: 201

calebwin/emu

The write-once-run-anywhere GPGPU library for Rust

Language: Rust - Size: 342 MB - Last synced at: 23 days ago - Pushed at: over 2 years ago - Stars: 1,610 - Forks: 52

BindsNET/bindsnet

Simulation of spiking neural networks (SNNs) using PyTorch.

Language: Python - Size: 38.9 MB - Last synced at: 1 day ago - Pushed at: 8 days ago - Stars: 1,591 - Forks: 334

mratsim/Arraymancer

A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends

Language: Nim - Size: 3.8 MB - Last synced at: 6 days ago - Pushed at: 4 months ago - Stars: 1,368 - Forks: 96

NVIDIA/MatX

An efficient C++17 GPU numerical computing library with Python-like syntax

Language: C++ - Size: 21.5 MB - Last synced at: 3 days ago - Pushed at: 9 days ago - Stars: 1,329 - Forks: 98

beehive-lab/TornadoVM

TornadoVM: A practical and efficient heterogeneous programming framework for managed languages

Language: Java - Size: 152 MB - Last synced at: 6 days ago - Pushed at: 16 days ago - Stars: 1,261 - Forks: 119

LuxCoreRender/LuxCore

LuxCore source repository

Language: C++ - Size: 155 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1,228 - Forks: 154

stotko/stdgpu

stdgpu: Efficient STL-like Data Structures on the GPU

Language: C++ - Size: 4.99 MB - Last synced at: 23 days ago - Pushed at: 2 months ago - Stars: 1,219 - Forks: 91

uncomplicate/neanderthal

Fast Clojure Matrix Library

Language: Clojure - Size: 3.99 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,097 - Forks: 58

AccelerateHS/accelerate

Embedded language for high-performance array computations

Language: Haskell - Size: 15.4 MB - Last synced at: 5 days ago - Pushed at: 23 days ago - Stars: 921 - Forks: 123

eyalroz/cuda-api-wrappers

Thin, unified, C++-flavored wrappers for the CUDA APIs

Language: C++ - Size: 2.85 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 842 - Forks: 84

Langhalsdino/Kubernetes-GPU-Guide

This guide should help fellow researchers and hobbyists to easily automate and accelerate there deep leaning training with their own Kubernetes GPU cluster.

Language: Shell - Size: 431 KB - Last synced at: 7 months ago - Pushed at: over 2 years ago - Stars: 816 - Forks: 115

LuxCoreRender/BlendLuxCore

Blender Integration for LuxCore

Language: Python - Size: 341 MB - Last synced at: 5 days ago - Pushed at: 9 days ago - Stars: 791 - Forks: 95

zszazi/Deep-learning-in-cloud

List of Deep Learning Cloud Providers

Size: 74.2 KB - Last synced at: 8 days ago - Pushed at: about 2 months ago - Stars: 784 - Forks: 94

iot-salzburg/gpu-jupyter

GPU-Jupyter: Your GPU-accelerated JupyterLab with a rich data science toolstack, TensorFlow and PyTorch for your reproducible deep learning experiments.

Language: Jupyter Notebook - Size: 1.21 MB - Last synced at: 23 days ago - Pushed at: 4 months ago - Stars: 743 - Forks: 237

ComputationalRadiationPhysics/picongpu

Performance-Portable Particle-in-Cell Simulations for the Exascale Era :sparkles:

Language: C++ - Size: 59.3 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 741 - Forks: 220

googlefonts/compute-shader-101

Sample code for compute shader 101 training

Language: Rust - Size: 284 KB - Last synced at: 30 days ago - Pushed at: 2 months ago - Stars: 575 - Forks: 33

huiscliu/Tutorials

Parallel programming tutorials

Language: C - Size: 55 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 560 - Forks: 191

AmesingFlank/taichi.js

Modern GPU Compute and Rendering in Javascript

Language: TypeScript - Size: 220 MB - Last synced at: 7 days ago - Pushed at: 11 months ago - Stars: 500 - Forks: 20

ccsb-scripps/AutoDock-GPU

AutoDock for GPUs and other accelerators

Language: C++ - Size: 44.4 MB - Last synced at: 23 days ago - Pushed at: 5 months ago - Stars: 479 - Forks: 123

software-mansion/TypeGPU

TypeScript library that enhances the WebGPU API, allowing resource management in a type-safe, declarative way.

Language: TypeScript - Size: 86 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 477 - Forks: 11

smistad/FAST

A framework for high-performance medical image processing, neural network inference and visualization

Language: C++ - Size: 19.6 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 472 - Forks: 107

ginkgo-project/ginkgo

Numerical linear algebra software package

Language: C++ - Size: 156 MB - Last synced at: 5 days ago - Pushed at: 8 days ago - Stars: 471 - Forks: 97

triSYCL/triSYCL

Generic system-wide modern C++ for heterogeneous platforms with SYCL from Khronos Group

Language: C++ - Size: 382 MB - Last synced at: 21 days ago - Pushed at: 8 months ago - Stars: 443 - Forks: 98

JuliaGPU/KernelAbstractions.jl

Heterogeneous programming in Julia

Language: Julia - Size: 4.31 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 435 - Forks: 74

ProjectPhysX/OpenCL-Wrapper

OpenCL is the most powerful programming language ever created. Yet the OpenCL C++ bindings are cumbersome and the code overhead prevents many people from getting started. I created this lightweight OpenCL-Wrapper to greatly simplify OpenCL software development with C++ while keeping functionality and performance.

Language: C++ - Size: 344 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 406 - Forks: 40

tumaer/JAXFLUIDS

Differentiable Fluid Dynamics Package

Language: Python - Size: 12.5 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 402 - Forks: 72

kpet/clvk

Implementation of OpenCL 3.0 on Vulkan

Language: C++ - Size: 2.02 MB - Last synced at: 18 days ago - Pushed at: 19 days ago - Stars: 393 - Forks: 42

RRZE-HPC/gpu-benches

collection of benchmarks to measure basic GPU capabilities

Language: C++ - Size: 1.78 MB - Last synced at: about 15 hours ago - Pushed at: 4 months ago - Stars: 385 - Forks: 55

uncomplicate/bayadera

High-performance Bayesian Data Analysis on the GPU in Clojure

Language: Clojure - Size: 1020 KB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 365 - Forks: 23

andrewmilson/ministark

🏃‍♂️💨 GPU accelerated STARK prover built on @arkworks-rs

Language: Rust - Size: 1.65 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 357 - Forks: 36

KernelTuner/kernel_tuner

Kernel Tuner

Language: Python - Size: 41.1 MB - Last synced at: about 23 hours ago - Pushed at: about 24 hours ago - Stars: 345 - Forks: 56

Glavnokoman/vuh

Vulkan compute for people

Language: C++ - Size: 705 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 340 - Forks: 34

gpufit/Gpufit

GPU-accelerated Levenberg-Marquardt curve fitting in CUDA

Language: Cuda - Size: 1.16 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 319 - Forks: 96

favreau/Sol-R Fork of cyrillefavreau/Sol-R

Open-Source CUDA/OpenCL Speed Of Light Ray-tracer

Language: C++ - Size: 22 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 306 - Forks: 12

brandondube/prysm

physical optics: integrated modeling, phase retrieval, segmented systems, polynomials and fitting, sequential raytracing...

Language: Python - Size: 12.2 MB - Last synced at: 5 days ago - Pushed at: 6 months ago - Stars: 301 - Forks: 48

fastflow/fastflow

FastFlow pattern-based parallel programming framework (formerly on sourceforge)

Language: C++ - Size: 136 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 292 - Forks: 70

uncomplicate/clojurecl

ClojureCL is a Clojure library for parallel computations with OpenCL.

Language: Clojure - Size: 874 KB - Last synced at: 28 days ago - Pushed at: about 1 year ago - Stars: 280 - Forks: 18

baggepinnen/MonteCarloMeasurements.jl

Propagation of distributions by Monte-Carlo sampling: Real number types with uncertainty represented by samples.

Language: Julia - Size: 4.91 MB - Last synced at: 12 days ago - Pushed at: about 2 months ago - Stars: 275 - Forks: 18

CodedK/CUDA-by-Example-source-code-for-the-book-s-examples-

CUDA by Example, written by two senior members of the CUDA software platform team, shows programmers how to employ this new technology. The authors introduce each area of CUDA development through working examples.

Language: C - Size: 1.07 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 272 - Forks: 108

mfem/PyMFEM

Python wrapper for MFEM

Language: SWIG - Size: 25.9 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 253 - Forks: 64

niessner/Opt

Opt DSL

Language: Terra - Size: 22.8 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 252 - Forks: 68

zjin-lcf/HeCBench

Language: C++ - Size: 296 MB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 247 - Forks: 89

ROCm/Tensile

[DEPRECATED] Moved to ROCm/rocm-libraries repo

Language: Python - Size: 95.2 MB - Last synced at: about 23 hours ago - Pushed at: 1 day ago - Stars: 245 - Forks: 166

denosaurs/netsaur

Powerful Powerful Machine Learning library with GPU, CPU and WASM backends

Language: Rust - Size: 146 MB - Last synced at: 1 day ago - Pushed at: 9 months ago - Stars: 244 - Forks: 4

cdeterman/gpuR

R interface to use GPU's

Language: R - Size: 12 MB - Last synced at: 29 days ago - Pushed at: about 5 years ago - Stars: 244 - Forks: 26

BasBuller/PySNN

Efficient Spiking Neural Network framework, built on top of PyTorch for GPU acceleration

Language: Python - Size: 12.8 MB - Last synced at: about 2 months ago - Pushed at: 11 months ago - Stars: 225 - Forks: 27

CaNS-World/CaNS

A code for fast, massively-parallel direct numerical simulations (DNS) of canonical flows

Language: Fortran - Size: 1.02 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 222 - Forks: 81

mikeroyal/GPU-Guide

Graphics Processing Unit (GPU) Architecture Guide

Language: Shell - Size: 815 KB - Last synced at: 3 days ago - Pushed at: over 3 years ago - Stars: 215 - Forks: 18

rsnemmen/OpenCL-examples

Simple OpenCL examples for exploiting GPU computing

Language: Objective-C++ - Size: 3.46 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 213 - Forks: 73

ProjectPhysX/OpenCL-Benchmark

A small OpenCL benchmark program to measure peak GPU/CPU performance.

Language: C++ - Size: 233 KB - Last synced at: 28 days ago - Pushed at: about 1 month ago - Stars: 210 - Forks: 27

penn-graphics-research/claymore

Language: Cuda - Size: 30.7 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 209 - Forks: 31

shiinamiyuki/akari_render

High Performance CPU/GPU Physically Based Renderer in Rust

Language: Rust - Size: 150 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 205 - Forks: 11

lnstadrum/beatmup

Beatmup: image and signal processing library

Language: C++ - Size: 11.8 MB - Last synced at: 26 days ago - Pushed at: over 1 year ago - Stars: 203 - Forks: 15

zeam-vm/pelemay

Pelemay is a native compiler for Elixir, which generates SIMD instructions. It has a plan to generate for GPU code.

Language: Elixir - Size: 410 KB - Last synced at: 26 days ago - Pushed at: over 4 years ago - Stars: 187 - Forks: 13

uncomplicate/clojurecuda

Clojure library for CUDA development

Language: Clojure - Size: 511 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 186 - Forks: 10

NumPower/numpower

PHP extension for efficient scientific computing and array manipulation with GPU support

Language: PHP - Size: 526 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 172 - Forks: 4

artyom-beilis/dlprimitives

Deep Learning Primitives and Mini-Framework for OpenCL

Language: C++ - Size: 58.2 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 169 - Forks: 16

nixon-voxell/GPUClothSimulationInUnity 📦

Trying to replicate what this legend did: https://youtu.be/kCGHXlLR3l8

Language: C# - Size: 201 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 165 - Forks: 16

AccelerateHS/accelerate-llvm

LLVM backend for Accelerate

Language: Haskell - Size: 3.68 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 164 - Forks: 56

preda/gpuowl

GPU Mersenne primality test.

Language: C++ - Size: 13.5 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 164 - Forks: 39

SamGinzburg/VectorVisor

VectorVisor is a vectorizing binary translator for GPUs, designed to make it easy to run many copies of a single-threaded WebAssembly program in parallel using GPUs

Language: WebAssembly - Size: 216 MB - Last synced at: 3 days ago - Pushed at: 9 months ago - Stars: 150 - Forks: 4

Ricks-Lab/gpu-utils

A set of utilities for monitoring and customizing GPU performance

Language: Python - Size: 3.98 MB - Last synced at: 12 days ago - Pushed at: about 1 year ago - Stars: 150 - Forks: 24

lachlan2k/phatcrack

Modern web-based distributed hashcracking solution, built on hashcat

Language: Go - Size: 10.7 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 140 - Forks: 11

houkensjtu/taichi-fluid

A collection of CFD related resources for Taichi developers.

Size: 5.84 MB - Last synced at: 10 days ago - Pushed at: 3 months ago - Stars: 139 - Forks: 6

GooFit/GooFit

Code repository for the massively-parallel framework for maximum-likelihood fits, implemented in CUDA/OpenMP

Language: Cuda - Size: 98 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 135 - Forks: 41

Zydak/Vulkan-Path-Tracer

Physically based path tracer made in Vulkan.

Language: C++ - Size: 1.04 GB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 134 - Forks: 3

AnicetNgrt/jiro-nn

A Deep Learning and preprocessing framework in Rust with support for CPU and GPU.

Language: Rust - Size: 17.5 MB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 131 - Forks: 3

PyOCL/OpenCLGA

A Python Library for Genetic Algorithm on OpenCL

Language: Python - Size: 17.4 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 117 - Forks: 32

tensordiffeq/TensorDiffEq

Efficient and Scalable Physics-Informed Deep Learning and Scientific Machine Learning on top of Tensorflow for multi-worker distributed computing

Language: Python - Size: 1.28 MB - Last synced at: 26 days ago - Pushed at: over 3 years ago - Stars: 115 - Forks: 42

IntelPython/dpctl

Python SYCL bindings and SYCL-based Python Array API library

Language: C++ - Size: 217 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 113 - Forks: 30

ComputationalRadiationPhysics/cuda_memtest

Fork of CUDA GPU memtest :eyeglasses:

Language: C++ - Size: 275 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 110 - Forks: 31

barbagroup/PetIBM

PetIBM - toolbox and applications of the immersed-boundary method on distributed-memory architectures

Language: C++ - Size: 14.9 MB - Last synced at: 5 days ago - Pushed at: almost 3 years ago - Stars: 108 - Forks: 52

radiantone/entangle

A lightweight (serverless) native python parallel processing framework based on simple decorators and call graphs.

Language: Python - Size: 2.33 MB - Last synced at: 21 days ago - Pushed at: almost 3 years ago - Stars: 104 - Forks: 7

ROCm/hipBLASLt

hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditional BLAS library

Language: Assembly - Size: 1.04 GB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 103 - Forks: 141

slai-labs/get-beam

Run GPU inference and training jobs on serverless infrastructure that scales with you.

Language: Shell - Size: 5.96 MB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 102 - Forks: 23

DeepMLNet/DeepNet

Deep.Net machine learning framework for F#

Language: F# - Size: 230 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 102 - Forks: 9

Heteroflow/Heteroflow

Concurrent CPU-GPU Programming using Task Models

Language: C++ - Size: 1.58 MB - Last synced at: 3 months ago - Pushed at: over 5 years ago - Stars: 101 - Forks: 13

ashvardanian/ParallelReductionsBenchmark

Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal, and Rust - all it takes to sum a lot of numbers fast!

Language: C++ - Size: 17.4 MB - Last synced at: 9 days ago - Pushed at: 23 days ago - Stars: 99 - Forks: 9

wmmae/wmma_extension

An extension library of WMMA API (Tensor Core API)

Language: Cuda - Size: 698 KB - Last synced at: 6 days ago - Pushed at: 11 months ago - Stars: 99 - Forks: 15

RedBlight/RaytrAMP

Shooting and bouncing rays method for radar cross-section calculations, accelerated with BVH algorithm running on GPU (C++ AMP).

Language: C++ - Size: 51 MB - Last synced at: 9 months ago - Pushed at: over 6 years ago - Stars: 98 - Forks: 31

larsgeb/m1-gpu-cpp

Metal Shading Language on Apple M1's GPU for scientific C++.

Language: C++ - Size: 10.9 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 91 - Forks: 18

coldfunction/qCUDA

qCUDA: GPGPU Virtualization at a New API Remoting Method with Para-virtualization

Language: C - Size: 89.9 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 91 - Forks: 31

etaler/Etaler

A flexable HTM (Hierarchical Temporal Memory) framework with full GPU support.

Language: C++ - Size: 73.8 MB - Last synced at: 7 months ago - Pushed at: over 2 years ago - Stars: 89 - Forks: 14

opensbli/opensbli

A framework for the automated derivation and parallel execution of finite difference solvers on a range of computer architectures.

Language: Python - Size: 136 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 86 - Forks: 32

tugrul512bit/Cekirdekler

Multi-device OpenCL kernel load balancer and pipeliner API for C#. Uses shared-distributed memory model to keep GPUs updated fast while using same kernel on all devices(for simplicity).

Language: C# - Size: 10.6 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 86 - Forks: 9