An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: cuda-kernel

teddykoker/torchsort

Fast, differentiable sorting and ranking in PyTorch

Language: Python - Size: 564 KB - Last synced at: 13 days ago - Pushed at: over 1 year ago - Stars: 805 - Forks: 37

ELS-RD/kernl

Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.

Language: Jupyter Notebook - Size: 2.28 MB - Last synced at: 7 days ago - Pushed at: about 1 year ago - Stars: 1,564 - Forks: 96

shreyansh26/MLSys-Experiments

A collection of scripts on experimenting and implementing MLSys-related stuff

Language: Jupyter Notebook - Size: 78.4 MB - Last synced at: 25 days ago - Pushed at: 26 days ago - Stars: 2 - Forks: 0

webis-de/pytorch-window-matmul

a custom CUDA kernel for windowed matrix multiplication

Language: Python - Size: 73.2 KB - Last synced at: 15 days ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 1

Shikha-code36/CUDA-Programming-Beginner-Guide

A beginner's guide to CUDA programming

Language: Cuda - Size: 108 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 2 - Forks: 0

tpoisonooo/how-to-optimize-gemm

row-major matmul optimization

Language: C++ - Size: 12.5 MB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 589 - Forks: 79

kachi-group/ichida-algo

Winning submission for StartHack 2024: HPC optimized multi-GPU/CPU inference

Language: C - Size: 1.07 MB - Last synced at: 21 days ago - Pushed at: 8 months ago - Stars: 3 - Forks: 0

ckswjd99-at-snu/SHPC-2023-2

SNU CSE Scalable High Performance Computing (M1522.006700) - 2023 Autumn

Language: C - Size: 41.3 MB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 1

GithubRealFan/keccak256-blockchain-hash-opencl-kernel

Language: C - Size: 2.93 KB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 1

GithubRealFan/Simple-Projects-CUDA

Language: Cuda - Size: 73.2 KB - Last synced at: 12 months ago - Pushed at: about 2 years ago - Stars: 9 - Forks: 0

GithubRealFan/Matrix-Multiply-CUDA

Language: Cuda - Size: 21.5 KB - Last synced at: 12 months ago - Pushed at: about 2 years ago - Stars: 8 - Forks: 1

ProgrammerGnome/CUDA-codes

Snippet repository for learning parallel GPU programming with CUDA.

Language: C++ - Size: 4.12 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

rbga/CPU-vs-GPU-Matrix-Operation

A performance comparison of standard matrix functions between CPU and GPU using Nvidia CUDA on Visual Studio using C++

Language: Cuda - Size: 1.52 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0