An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: kv-cache-compression

abdelfattah-lab/xKV

xKV: Cross-Layer SVD for KV-Cache Compression

Language: Python - Size: 30.9 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 24 - Forks: 1

Linking-ai/SCOPE

SCOPE: Optimizing KV Cache Compression in Long-context Generation

Language: Jupyter Notebook - Size: 6.21 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 23 - Forks: 2

NVIDIA/kvpress

LLM KV cache compression made easy

Language: Python - Size: 5.55 MB - Last synced at: 4 days ago - Pushed at: 13 days ago - Stars: 476 - Forks: 36

Zefan-Cai/KVCache-Factory

Unified KV Cache Compression Methods for Auto-Regressive Models

Language: Python - Size: 191 MB - Last synced at: 11 days ago - Pushed at: 5 months ago - Stars: 1,044 - Forks: 136

itsnamgyu/block-transformer

Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)

Language: Python - Size: 515 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 151 - Forks: 8

Zefan-Cai/Awesome-LLM-KV-Cache

Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

Size: 56.6 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 221 - Forks: 13

shadowpa0327/Palu

Code for Palu: Compressing KV-Cache with Low-Rank Projection

Language: Python - Size: 337 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 76 - Forks: 4

dvlab-research/Q-LLM

This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"

Language: Python - Size: 6.84 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 29 - Forks: 0

snu-mllab/Context-Memory

Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)

Language: Python - Size: 2.1 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 43 - Forks: 1