Topic: "flash-mla"
xlite-dev/Awesome-LLM-Inference
📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc.
Language: Python - Size: 115 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 3,900 - Forks: 275

xlite-dev/LeetCUDA
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥
Language: Cuda - Size: 262 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 3,627 - Forks: 393

xlite-dev/ffpa-attn-mma
📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) SRAM complexity large headdim (D > 256), ~2x↑🎉vs SDPA EA.
Language: Cuda - Size: 4.21 MB - Last synced at: 29 days ago - Pushed at: about 1 month ago - Stars: 161 - Forks: 7
