GitHub topics: kv-cache-compression

Repositories

snu-mllab/KVzip

Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)

Language: Python - Size: 1.88 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 92 - Forks: 2

NVIDIA/kvpress

LLM KV cache compression made easy

Language: Python - Size: 6.09 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 556 - Forks: 49

shadowpa0327/Palu

[ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projection

Language: Python - Size: 337 KB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 126 - Forks: 6

dvlab-research/Q-LLM

This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"

Language: Python - Size: 6.84 MB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 53 - Forks: 4

abdelfattah-lab/xKV

xKV: Cross-Layer SVD for KV-Cache Compression

Language: Python - Size: 30.9 MB - Last synced at: 7 days ago - Pushed at: about 1 month ago - Stars: 27 - Forks: 1

Linking-ai/SCOPE

SCOPE: Optimizing KV Cache Compression in Long-context Generation

Language: Jupyter Notebook - Size: 6.21 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 23 - Forks: 2

Zefan-Cai/KVCache-Factory

Unified KV Cache Compression Methods for Auto-Regressive Models

Language: Python - Size: 191 MB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 1,044 - Forks: 136

itsnamgyu/block-transformer

Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)

Language: Python - Size: 515 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 151 - Forks: 8

Zefan-Cai/Awesome-LLM-KV-Cache

Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

Size: 56.6 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 221 - Forks: 13

snu-mllab/Context-Memory

Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)

Language: Python - Size: 2.1 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 43 - Forks: 1

Related Keywords

kv-cache-compression 10 kv-cache 4 llm 4 long-context 4 large-language-models 3 llm-inference 2 mla 2 kv-cache-quantization 2 deepseek 2 transformers 1 pytorch 1 python 1 fast-inference 1 inference-acceleration 1 inter-layer 1 inference 1 low-rank 1 kvcache 1 llm-architecture 1 context-compression 1 efficient-llm-inference 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos