GitHub topics: flashinfer
sgl-project/whl
Kernel Library Wheel for SGLang
Language: HTML - Size: 51.8 KB - Last synced at: 7 days ago - Pushed at: 11 days ago - Stars: 11 - Forks: 2

Bruce-Lee-LY/decoding_attention
Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.
Language: C++ - Size: 867 KB - Last synced at: 25 days ago - Pushed at: 3 months ago - Stars: 40 - Forks: 4
