An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: flash-mla

xlite-dev/Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Language: Python - Size: 115 MB - Last synced at: 6 days ago - Pushed at: 13 days ago - Stars: 4,260 - Forks: 294

xlite-dev/ffpa-attn

⚡️FFPA: Extend FlashAttention-2 with Split-D, achieve ~O(1) SRAM complexity for large headdim, 1.8x~3x↑ vs SDPA.🎉

Language: Cuda - Size: 4.21 MB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 190 - Forks: 8