GitHub topics: cuda-core
Bruce-Lee-LY/decoding_attention
Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.
Language: C++ - Size: 884 KB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 36 - Forks: 3

Bruce-Lee-LY/cuda_hgemv
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
Language: Cuda - Size: 459 KB - Last synced at: 2 days ago - Pushed at: 8 months ago - Stars: 61 - Forks: 5
