GitHub topics: cuda-cpp
MuGdxy/muda
μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
Language: C++ - Size: 14.7 MB - Last synced at: about 20 hours ago - Pushed at: about 22 hours ago - Stars: 189 - Forks: 11

xlite-dev/LeetCUDA
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
Language: Cuda - Size: 263 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 6,891 - Forks: 713

facebookresearch/CUTracer
A dynamic binary instrumentation tool for tracing and analyzing CUDA kernel instructions.
Language: Cuda - Size: 271 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

NVIDIA/cccl
CUDA Core Compute Libraries
Language: C++ - Size: 107 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1,904 - Forks: 266

GPUEngineering/GPUtils
A C++ header-only library for parallel linear algebra on GPUs (CUDA/cuBLAS under the hood)
Language: Cuda - Size: 401 KB - Last synced at: about 7 hours ago - Pushed at: 6 months ago - Stars: 4 - Forks: 0

Shikha-code36/CUDA-Programming-Beginner-Guide
A beginner's guide to CUDA programming
Language: Cuda - Size: 108 KB - Last synced at: 6 months ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

Bhargavoza1/cuda_neural_network
learning to develop lightning fast C++/CUDA neural network
Language: C++ - Size: 145 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

zheliu137/GPU_Perf_UnitTest
Test the GPU performance on Linear Algebra Operations. Compare the results with CPP/Fortran
Language: Cuda - Size: 52.7 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

MohammadDallash/cuda-cpp-programming
This repo contains some CUDA C++ code examples that demonstrate how to use GPUs for parallel computing. Covering topics such as dynamic parallelization, Optimization, ....etc
Language: Cuda - Size: 17.6 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

rbga/CUDA-Merge-and-Bitonic-Sort
Efficient implementations of Merge Sort and Bitonic Sort algorithms using CUDA for GPU parallel processing, resulting in accelerated sorting of large arrays. Includes both CPU and GPU versions, along with a performance comparison.
Language: Cuda - Size: 12.7 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0
