decoding-attention | Topic | Ecosyste.ms: Repos

Topic: "decoding-attention"

Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.

Language: C++ - Size: 884 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 35 - Forks: 2