GitHub topics: cuda-kernel
teddykoker/torchsort
Fast, differentiable sorting and ranking in PyTorch
Language: Python - Size: 564 KB - Last synced at: 13 days ago - Pushed at: over 1 year ago - Stars: 805 - Forks: 37

ELS-RD/kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
Language: Jupyter Notebook - Size: 2.28 MB - Last synced at: 7 days ago - Pushed at: about 1 year ago - Stars: 1,564 - Forks: 96

shreyansh26/MLSys-Experiments
A collection of scripts on experimenting and implementing MLSys-related stuff
Language: Jupyter Notebook - Size: 78.4 MB - Last synced at: 25 days ago - Pushed at: 26 days ago - Stars: 2 - Forks: 0

webis-de/pytorch-window-matmul
a custom CUDA kernel for windowed matrix multiplication
Language: Python - Size: 73.2 KB - Last synced at: 15 days ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 1

Shikha-code36/CUDA-Programming-Beginner-Guide
A beginner's guide to CUDA programming
Language: Cuda - Size: 108 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 2 - Forks: 0

tpoisonooo/how-to-optimize-gemm
row-major matmul optimization
Language: C++ - Size: 12.5 MB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 589 - Forks: 79

kachi-group/ichida-algo
Winning submission for StartHack 2024: HPC optimized multi-GPU/CPU inference
Language: C - Size: 1.07 MB - Last synced at: 21 days ago - Pushed at: 8 months ago - Stars: 3 - Forks: 0

ckswjd99-at-snu/SHPC-2023-2
SNU CSE Scalable High Performance Computing (M1522.006700) - 2023 Autumn
Language: C - Size: 41.3 MB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 1

GithubRealFan/keccak256-blockchain-hash-opencl-kernel
Language: C - Size: 2.93 KB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 1

GithubRealFan/Simple-Projects-CUDA
Language: Cuda - Size: 73.2 KB - Last synced at: 12 months ago - Pushed at: about 2 years ago - Stars: 9 - Forks: 0

GithubRealFan/Matrix-Multiply-CUDA
Language: Cuda - Size: 21.5 KB - Last synced at: 12 months ago - Pushed at: about 2 years ago - Stars: 8 - Forks: 1

ProgrammerGnome/CUDA-codes
Snippet repository for learning parallel GPU programming with CUDA.
Language: C++ - Size: 4.12 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

rbga/CPU-vs-GPU-Matrix-Operation
A performance comparison of standard matrix functions between CPU and GPU using Nvidia CUDA on Visual Studio using C++
Language: Cuda - Size: 1.52 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0
