GitHub topics: token-pruning

Repositories

xuyang-liu16/Awesome-Token-level-Model-Compression

📚 Collection of token-level model compression resources.

Size: 2.03 MB - Last synced at: 5 days ago - Pushed at: 12 days ago - Stars: 155 - Forks: 5

sameerjan-dev/FlashAttention-pytorch

🚀 Streamline attention processes with FlashAttention, a fast and memory-efficient PyTorch implementation that optimizes GPU usage for better performance.

Language: Jupyter Notebook - Size: 430 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

ModelTC/LightCompress

A powerful toolkit for compressing large models including LLM, VLM, and video generation models.

Language: Python - Size: 30.5 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 547 - Forks: 61

cokeshao/HoliTom

HoliTom: Holistic Token Merging for Fast Video Large Language Models

Language: Python - Size: 8.02 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 39 - Forks: 0

MILVLG/twigvlm

Implementation of ICCV 2025 paper "Growing a Twig to Accelerate Large Vision-Language Models".

Language: Python - Size: 3.81 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 1

vbdi/divprune

[CVPR 2025] DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models

Language: Python - Size: 11 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 18 - Forks: 0

microsoft/Moonlit

This is a collection of our research on efficient AI, covering hardware-aware NAS and model compression.

Language: Python - Size: 12 MB - Last synced at: 5 days ago - Pushed at: 11 months ago - Stars: 83 - Forks: 6

sangminwoo/awesome-token-redundancy-reduction

😎 Awesome papers on token redundancy reduction

Size: 80.1 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 3 - Forks: 0

Adam-Mazur/Lazy-Llama

An implementation of LazyLLM token pruning for LLaMa 2 model family.

Language: Python - Size: 27.3 KB - Last synced at: 5 months ago - Pushed at: 8 months ago - Stars: 11 - Forks: 0

mlvlab/vid-TLDR

Official implementation of CVPR 2024 paper "vid-TLDR: Training Free Token merging for Light-weight Video Transformer".

Language: Python - Size: 1.04 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 18 - Forks: 0

Jungmin-YUN-0/Attention_Lightweight

Language: Python - Size: 1.8 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

Related Keywords

token-pruning 11 token-merging 5 transformer 2 pytorch 2 multimodal-large-language-models 2 large-language-models 2 llm 2 efficient-vision-transformers 2 pruning 2 llava 2 token-compression 2 model-compression 2 token-reduction 2 computer-vision 2 llava-next-video 1 wan 1 video-large-language-models 1 visionzip 1 inference-acceleration 1 vllm 1 vision-language-models 1 inference-optimization 1 multi-modality 1 vision-language-model 1 inference-efficiency 1 neural-architecture-search 1 token-redundancy-reduction 1 token-sparsification 1 huggingface 1 huggingface-transformers 1 llama 1 llama2 1 transformers 1 cvpr2024 1 video-transformer 1 classification 1 lightweight 1 self-attention 1 efficient-deep-learning 1 model-acceleration 1 attention 1 attention-mechanism 1 cuda 1 deep-learning 1 flashattention 1 iccv2025 1 nlp 1 object-detection 1 optimization 1 perceiver 1 sdpa 1 ultralytics 1 yolo 1 yolov12 1 awq 1 benchmark 1 deepseek-v3 1 deployment 1 evaluation 1 internlm2 1 mixtral 1 quantization 1 smoothquant 1 tool 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos