GitHub topics: flashinfer

Repositories

sgl-project/whl

Kernel Library Wheel for SGLang

Language: HTML - Size: 51.8 KB - Last synced at: 7 days ago - Pushed at: 11 days ago - Stars: 11 - Forks: 2

Bruce-Lee-LY/decoding_attention

Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.

Language: C++ - Size: 867 KB - Last synced at: 25 days ago - Pushed at: 3 months ago - Stars: 40 - Forks: 4

Related Keywords

cuda 2 flashinfer 2 nvidia 1 multi-head-attention 1 mqa 1 mla 1 mha 1 llm 1 large-language-model 1 inference 1 gqa 1 gpu 1 flashmla 1 flash-attention 1 decoding-attention 1 cuda-core 1 sglang 1 cutlass 1