An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: decoding-attention

Bruce-Lee-LY/decoding_attention

Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.

Language: C++ - Size: 867 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 37 - Forks: 4