GitHub topics: sparse-attention
SHI-Labs/NATTEN
Neighborhood Attention Extension. Bringing attention to a neighborhood near you!
Language: C++ - Size: 17.9 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 510 - Forks: 41

HanzhiZhang-Ulrica/DAM
Dynamic Attention Mask (DAM) generate adaptive sparse attention masks per layer and head for Transformer models, enabling long-context inference with lower compute and memory overhead without fine-tuning.
Language: Python - Size: 9.77 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

thu-ml/SpargeAttn
SpargeAttention: A training-free sparse attention that can accelerate any model inference.
Language: Cuda - Size: 55.4 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 581 - Forks: 40

ByteDance-Seed/ShadowKV
[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Language: Python - Size: 20.5 MB - Last synced at: 17 days ago - Pushed at: about 1 month ago - Stars: 184 - Forks: 12

XunhaoLai/native-sparse-attention-triton
Efficient triton implementation of Native Sparse Attention.
Language: Python - Size: 266 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 148 - Forks: 6

lucidrains/native-sparse-attention-pytorch
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
Language: Python - Size: 34.6 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 623 - Forks: 32

ByteDance-Seed/FlexPrefill
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
Language: Python - Size: 641 KB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 103 - Forks: 6

noahsyntax/native-sparse-attention-pytorch
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
Language: Python - Size: 34.4 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

thu-nics/MoA
The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>
Language: Python - Size: 532 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 103 - Forks: 6

eezkni/SSIU
Pytorch implementation of "Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution"
Size: 4.88 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 2 - Forks: 0

moon23k/Efficient_Summarization
Text Summarization Modeling with three different Attention Types
Language: Python - Size: 43.9 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

lim142857/Sparsifiner
Official Codebase for CVPR2023 paper "Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers"
Language: Python - Size: 46.9 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0
