GitHub topics: kv-cache-compression
abdelfattah-lab/xKV
xKV: Cross-Layer SVD for KV-Cache Compression
Language: Python - Size: 30.9 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 24 - Forks: 1

Linking-ai/SCOPE
SCOPE: Optimizing KV Cache Compression in Long-context Generation
Language: Jupyter Notebook - Size: 6.21 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 23 - Forks: 2

NVIDIA/kvpress
LLM KV cache compression made easy
Language: Python - Size: 5.55 MB - Last synced at: 4 days ago - Pushed at: 13 days ago - Stars: 476 - Forks: 36

Zefan-Cai/KVCache-Factory
Unified KV Cache Compression Methods for Auto-Regressive Models
Language: Python - Size: 191 MB - Last synced at: 11 days ago - Pushed at: 5 months ago - Stars: 1,044 - Forks: 136

itsnamgyu/block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
Language: Python - Size: 515 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 151 - Forks: 8

Zefan-Cai/Awesome-LLM-KV-Cache
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.
Size: 56.6 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 221 - Forks: 13

shadowpa0327/Palu
Code for Palu: Compressing KV-Cache with Low-Rank Projection
Language: Python - Size: 337 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 76 - Forks: 4

dvlab-research/Q-LLM
This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
Language: Python - Size: 6.84 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 29 - Forks: 0

snu-mllab/Context-Memory
Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)
Language: Python - Size: 2.1 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 43 - Forks: 1
