GitHub topics: sparse-attention

Repositories

ByteDance-Seed/FlexPrefill

Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference

Language: Python - Size: 654 KB - Last synced at: 3 days ago - Pushed at: 27 days ago - Stars: 112 - Forks: 7

lucidrains/native-sparse-attention-pytorch

Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper

Language: Python - Size: 34.6 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 645 - Forks: 34

ByteDance-Seed/ShadowKV

[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Language: Python - Size: 20.5 MB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 197 - Forks: 14

SHI-Labs/NATTEN

Neighborhood Attention Extension. Bringing attention to a neighborhood near you!

Language: C++ - Size: 17.9 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 510 - Forks: 41

Dynamic Attention Mask (DAM) generate adaptive sparse attention masks per layer and head for Transformer models, enabling long-context inference with lower compute and memory overhead without fine-tuning.

Language: Python - Size: 9.77 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

thu-ml/SpargeAttn

SpargeAttention: A training-free sparse attention that can accelerate any model inference.

Language: Cuda - Size: 55.4 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 581 - Forks: 40

XunhaoLai/native-sparse-attention-triton

Efficient triton implementation of Native Sparse Attention.

Language: Python - Size: 266 KB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 148 - Forks: 6

noahsyntax/native-sparse-attention-pytorch

Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper

Language: Python - Size: 34.4 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

thu-nics/MoA

The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>

Language: Python - Size: 532 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 103 - Forks: 6

eezkni/SSIU

Pytorch implementation of "Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution"

Size: 4.88 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 2 - Forks: 0

moon23k/Efficient_Summarization

Text Summarization Modeling with three different Attention Types

Language: Python - Size: 43.9 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

lim142857/Sparsifiner

Official Codebase for CVPR2023 paper "Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers"

Language: Python - Size: 46.9 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

Related Keywords

sparse-attention 12 large-language-models 3 attention 3 natural-language-processing 2 research 2 deep-learning 2 vision-transformer 2 low-rank 2 attention-mechanism 2 quantization 1 sageattention 1 video-generation 1 vit 1 model-compression 1 lightweight 1 super-resolution 1 text-summarization 1 efficient-transformers 1 efficient-vision-transformers 1 fast-inference 1 sparse-neural-networks 1 mlsys 1 llm 1 inference-acceleration 1 ai-infra 1 inference-optimization 1 efficient-ai 1 pytorch 1 neighborhood-attention 1 cuda 1 long-context 1 llm-inference 1 high-throughput 1 cpu-offload 1 artificial-intelligence 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos