GitHub topics: cuda-cpp
FlosMume/cpp-cuda-starter
CUDA C/C++ starter template for Windows 11 + WSL2 (RTX 4070 SUPER tested)
Language: Shell - Size: 3.34 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0
NVIDIA/cccl
CUDA Core Compute Libraries
Language: C++ - Size: 295 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2,022 - Forks: 289
facebookresearch/CUTracer
A dynamic binary instrumentation tool for tracing and analyzing CUDA kernel instructions.
Language: Cuda - Size: 303 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 19 - Forks: 2
MuGdxy/muda
μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
Language: C++ - Size: 14.8 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 193 - Forks: 13
xlite-dev/LeetCUDA
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
Language: Cuda - Size: 263 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 8,013 - Forks: 796
GPUEngineering/GPUtils
A C++ header-only library for parallel linear algebra on GPUs (CUDA/cuBLAS under the hood)
Language: Cuda - Size: 401 KB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 4 - Forks: 0
Shikha-code36/CUDA-Programming-Beginner-Guide
A beginner's guide to CUDA programming
Language: Cuda - Size: 108 KB - Last synced at: 8 months ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0
Bhargavoza1/cuda_neural_network
learning to develop lightning fast C++/CUDA neural network
Language: C++ - Size: 145 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0
zheliu137/GPU_Perf_UnitTest
Test the GPU performance on Linear Algebra Operations. Compare the results with CPP/Fortran
Language: Cuda - Size: 52.7 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0
MohammadDallash/cuda-cpp-programming
This repo contains some CUDA C++ code examples that demonstrate how to use GPUs for parallel computing. Covering topics such as dynamic parallelization, Optimization, ....etc
Language: Cuda - Size: 17.6 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0
rbga/CUDA-Merge-and-Bitonic-Sort
Efficient implementations of Merge Sort and Bitonic Sort algorithms using CUDA for GPU parallel processing, resulting in accelerated sorting of large arrays. Includes both CPU and GPU versions, along with a performance comparison.
Language: Cuda - Size: 12.7 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0