Topic: "sparse-attention"
lucidrains/native-sparse-attention-pytorch
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
Language: Python - Size: 34.6 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 645 - Forks: 34

thu-ml/SpargeAttn
SpargeAttention: A training-free sparse attention that can accelerate any model inference.
Language: Cuda - Size: 55.4 MB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 603 - Forks: 44

SHI-Labs/NATTEN
Neighborhood Attention Extension. Bringing attention to a neighborhood near you!
Language: C++ - Size: 18.5 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 525 - Forks: 41

ByteDance-Seed/ShadowKV
[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Language: Python - Size: 20.5 MB - Last synced at: 13 days ago - Pushed at: about 2 months ago - Stars: 197 - Forks: 14

XunhaoLai/native-sparse-attention-triton
Efficient triton implementation of Native Sparse Attention.
Language: Python - Size: 266 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 148 - Forks: 6

ByteDance-Seed/FlexPrefill
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
Language: Python - Size: 654 KB - Last synced at: 12 days ago - Pushed at: about 1 month ago - Stars: 112 - Forks: 7

thu-nics/MoA
The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>
Language: Python - Size: 532 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 103 - Forks: 6

eezkni/SSIU
[TIP-2025] Pytorch implementation of "Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution"
Language: Python - Size: 27 MB - Last synced at: about 19 hours ago - Pushed at: about 20 hours ago - Stars: 10 - Forks: 1

lim142857/Sparsifiner
Official Codebase for CVPR2023 paper "Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers"
Language: Python - Size: 46.9 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

vleonel-junior/TabNSA_CCP
Classification binaire avec architecture Sparse Attention pour données tabulaires. Optimisation automatique des hyperparamètres via Optuna. Testé sur datasets de churn télécommunications et bancaire.
Language: Python - Size: 422 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

DoQuantum/r1.7-planck-pioneers
Integrating QC techniques into Sparse Attention for Transformers
Language: Python - Size: 7.81 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

HanzhiZhang-Ulrica/DAM
Dynamic Attention Mask (DAM) generate adaptive sparse attention masks per layer and head for Transformer models, enabling long-context inference with lower compute and memory overhead without fine-tuning.
Language: Python - Size: 9.77 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

noahsyntax/native-sparse-attention-pytorch
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
Language: Python - Size: 34.4 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

moon23k/Efficient_Summarization
Text Summarization Modeling with three different Attention Types
Language: Python - Size: 43.9 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0
