ecosyste.ms

Repos

An open API service providing repository metadata for many open source software ecosystems.

Topic: "flash-mla"

xlite-dev/Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, Parallelism, MLA, etc.

Language: Python - Size: 115 MB - Last synced at: about 1 hour ago - Pushed at: about 2 hours ago - Stars: 4,100 - Forks: 283

xlite-dev/ffpa-attn

📚FFPA(Split-D): Extend FlashAttention with Split-D for large headdim, O(1) GPU SRAM complexity, 1.8x~3x↑🎉 faster than SDPA EA.

Language: Cuda - Size: 4.21 MB - Last synced at: 7 days ago - Pushed at: 30 days ago - Stars: 183 - Forks: 8

Related Topics

deepseek 2 deepseek-r1 2 deepseek-v3 2 flash-attention 2 mla 2 tensor-cores 1 sdpa 1 mlsys 1 fused-mla 1 cuda 1 attention 1 vllm 1 tensorrt-llm 1 qwen3 1 paged-attention 1 minimax-01 1 llm-inference 1 flash-attention-3 1 awesome-llm 1