An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: kv-cache-compression

NVIDIA/kvpress

LLM KV cache compression made easy

Language: Python - Size: 5.54 MB - Last synced at: about 3 hours ago - Pushed at: about 17 hours ago - Stars: 509 - Forks: 40

snu-mllab/KVzip

Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)

Language: Python - Size: 1.46 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 21 - Forks: 1

abdelfattah-lab/xKV

xKV: Cross-Layer SVD for KV-Cache Compression

Language: Python - Size: 30.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 24 - Forks: 1

Linking-ai/SCOPE

SCOPE: Optimizing KV Cache Compression in Long-context Generation

Language: Jupyter Notebook - Size: 6.21 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 23 - Forks: 2

Zefan-Cai/KVCache-Factory

Unified KV Cache Compression Methods for Auto-Regressive Models

Language: Python - Size: 191 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 1,044 - Forks: 136

itsnamgyu/block-transformer

Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)

Language: Python - Size: 515 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 151 - Forks: 8

Zefan-Cai/Awesome-LLM-KV-Cache

Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

Size: 56.6 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 221 - Forks: 13

shadowpa0327/Palu

Code for Palu: Compressing KV-Cache with Low-Rank Projection

Language: Python - Size: 337 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 76 - Forks: 4

dvlab-research/Q-LLM

This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"

Language: Python - Size: 6.84 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 29 - Forks: 0

snu-mllab/Context-Memory

Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)

Language: Python - Size: 2.1 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 43 - Forks: 1