An open API service providing repository metadata for many open source software ecosystems.

Topic: "flash-mla"

xlite-dev/Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc.

Language: Python - Size: 115 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 3,900 - Forks: 275

xlite-dev/LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥

Language: Cuda - Size: 262 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 3,627 - Forks: 393

xlite-dev/ffpa-attn-mma

📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) SRAM complexity large headdim (D > 256), ~2x↑🎉vs SDPA EA.

Language: Cuda - Size: 4.21 MB - Last synced at: 29 days ago - Pushed at: about 1 month ago - Stars: 161 - Forks: 7