GitHub topics: model-compression
Picovoice/picollm
On-device LLM Inference Powered by X-Bit Quantization
Language: Python - Size: 98 MB - Last synced at: 1 day ago - Pushed at: 13 days ago - Stars: 250 - Forks: 14

huawei-noah/Pretrained-Language-Model
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
Language: Python - Size: 29 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 3,103 - Forks: 637

EnricoSimionato/Alternative-Model-Architectures
Research-oriented project focusing on implementing and evaluating novel compression techniques for large language models (LLMs).
Language: Python - Size: 19.7 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

tensorflow/model-optimization
A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
Language: Python - Size: 2.22 MB - Last synced at: 4 days ago - Pushed at: 4 months ago - Stars: 1,534 - Forks: 327

dwiaskor99/contrastive-distillation
CAST is a method for semi-supervised instance segmentation that efficiently trains a compact model using both labeled and unlabeled data. This repository contains the implementation of our three-stage pipeline, showcasing contrastive adaptation and distillation techniques. 🐙🌟
Size: 3.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

microsoft/nni 📦
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
Language: Python - Size: 127 MB - Last synced at: 5 days ago - Pushed at: 12 months ago - Stars: 14,205 - Forks: 1,822

pvti/Awesome-Tensor-Decomposition
😎 A curated list of tensor decomposition resources for model compression.
Size: 488 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 68 - Forks: 8

lpalbou/model-quantizer
Effortlessly quantize, benchmark, and publish Hugging Face models with cross-platform support for CPU/GPU. Reduce model size by 75% while maintaining performance.
Language: Python - Size: 165 KB - Last synced at: 3 days ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

SqueezeAILab/KVQuant
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Language: Python - Size: 19.8 MB - Last synced at: 1 day ago - Pushed at: 11 months ago - Stars: 359 - Forks: 31

dkozlov/awesome-knowledge-distillation
Awesome Knowledge Distillation
Size: 215 KB - Last synced at: 6 days ago - Pushed at: 14 days ago - Stars: 3,686 - Forks: 512

d0tTino/DeepThought-ReThought
A refactored version of the DeepThought Discord bot, focusing on improved architecture, performance, and AI agent capabilities.
Language: Python - Size: 18.6 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

eezkni/SVRF
[TIP-2025] Pytorch implementation of "Shell-guided Compression of Voxel Radiance Fields"
Language: Python - Size: 1.17 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 4 - Forks: 0

FLHonker/Awesome-Knowledge-Distillation
Awesome Knowledge-Distillation. 分类整理的知识蒸馏paper(2014-2021)。
Size: 457 KB - Last synced at: 7 days ago - Pushed at: about 2 years ago - Stars: 2,601 - Forks: 338

datawhalechina/awesome-compression
模型压缩的小白入门教程,PDF下载地址 https://github.com/datawhalechina/awesome-compression/releases
Size: 311 MB - Last synced at: 6 days ago - Pushed at: 11 days ago - Stars: 295 - Forks: 36

zhang-fengdi/ControlGS
Official reference implementation of "Consistent Quantity-Quality Control across Scenes for Deployment-Aware Gaussian Splatting"
Language: C++ - Size: 15.1 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 6 - Forks: 0

deadlykitten4/ResSVD
ResSVD: Residual Compensated SVD for Large Language Model Compression
Size: 9.77 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

pratyushasharma/laser
The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
Language: Python - Size: 2.25 MB - Last synced at: 1 day ago - Pushed at: 12 months ago - Stars: 388 - Forks: 34

Efficient-ML/Awesome-Model-Quantization
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
Size: 61.5 MB - Last synced at: 11 days ago - Pushed at: 4 months ago - Stars: 2,131 - Forks: 224

SqueezeAILab/SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
Language: Python - Size: 1.5 MB - Last synced at: 1 day ago - Pushed at: 11 months ago - Stars: 692 - Forks: 46

ardaerendogru/dinov2_distillation
This project implements knowledge distillation from DINOv2 (Vision Transformer) to convolutional networks, enabling efficient visual representation learning with reduced computational requirements.
Language: Python - Size: 92.8 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 5 - Forks: 0

VainF/Torch-Pruning
[CVPR 2023] DepGraph: Towards Any Structural Pruning; LLMs, Vision Foundation Models, etc.
Language: Python - Size: 10 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 3,038 - Forks: 351

xuyang-liu16/Awesome-Token-level-Model-Compression
📚 Collection of token-level model compression resources.
Size: 1.72 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 117 - Forks: 4

huawei-noah/Efficient-AI-Backbones
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
Language: Python - Size: 98.4 MB - Last synced at: 12 days ago - Pushed at: 3 months ago - Stars: 4,240 - Forks: 723

666DZY666/micronet
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape
Language: Python - Size: 6.68 MB - Last synced at: 16 days ago - Pushed at: about 2 months ago - Stars: 2,250 - Forks: 476

Inpyo-Hong/Model-Compression-Paper-List
Model Compression Paper List (Focusing on Quantization, Particularly Zero-Shot Quantization)
Size: 42 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 1 - Forks: 0

Efficient-ML/Awesome-Efficient-AIGC
A list of papers, docs, codes about efficient AIGC. This repo is aimed to provide the info for efficient AIGC research, including language and vision, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
Size: 63.5 KB - Last synced at: 11 days ago - Pushed at: 4 months ago - Stars: 183 - Forks: 11

merantix-momentum/acip
🗜️Codebase of the ACIP algorithm 🗜️
Language: Python - Size: 259 KB - Last synced at: 9 days ago - Pushed at: 15 days ago - Stars: 9 - Forks: 0

THU-MIG/torch-model-compression
针对pytorch模型的自动化模型结构分析和修改工具集,包含自动分析模型结构的模型压缩算法库
Language: Python - Size: 132 KB - Last synced at: about 9 hours ago - Pushed at: about 2 years ago - Stars: 249 - Forks: 41

vanhai1231/autoquant-infer
Công cụ giảm kích thước mô hình bằng Quantization, kết hợp AI Agent để tự động chọn mức tối ưu, giúp tăng tốc và tiết kiệm chi phí inference.
Language: Python - Size: 54.7 KB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

hnuzhy/CV_DL_Gather
Gather research papers, corresponding codes (if having), reading notes and any other related materials about Hot🔥🔥🔥 fields in Computer Vision based on Deep Learning.
Size: 37.6 MB - Last synced at: 21 days ago - Pushed at: 22 days ago - Stars: 74 - Forks: 6

BaiTheBest/SparseLLM
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
Language: Python - Size: 145 KB - Last synced at: 11 days ago - Pushed at: 3 months ago - Stars: 61 - Forks: 9

Sharpiless/Yolov5-distillation-train-inference
Yolov5 distillation training | Yolov5知识蒸馏训练,支持训练自己的数据
Language: Python - Size: 2.36 MB - Last synced at: 16 days ago - Pushed at: over 2 years ago - Stars: 220 - Forks: 33

wangxb96/Awesome-EdgeAI
Resources of our survey paper "Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System Strategies"
Size: 3.64 MB - Last synced at: 11 days ago - Pushed at: 6 months ago - Stars: 87 - Forks: 8

Tencent/PocketFlow
An Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications.
Language: Python - Size: 1.13 MB - Last synced at: 27 days ago - Pushed at: about 2 years ago - Stars: 2,884 - Forks: 490

alibaba/TinyNeuralNetwork
TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
Language: Python - Size: 25.1 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 825 - Forks: 126

vinhkhuc/JFastText
Java interface for fastText
Language: Java - Size: 57.6 KB - Last synced at: 7 days ago - Pushed at: about 2 years ago - Stars: 237 - Forks: 98

horseee/DeepCache
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
Language: Python - Size: 102 MB - Last synced at: 27 days ago - Pushed at: 12 months ago - Stars: 893 - Forks: 43

cedrickchee/awesome-ml-model-compression
Awesome machine learning model compression research papers, quantization, tools, and learning material.
Size: 213 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 523 - Forks: 60

tianyic/only_train_once_personal_footprint
OTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured Pruning, Erasing Operators, CNN, Diffusion, LLM
Language: Python - Size: 2.94 MB - Last synced at: 19 days ago - Pushed at: 9 months ago - Stars: 302 - Forks: 48

guan-yuan/Awesome-AutoML-and-Lightweight-Models
A list of high-quality (newest) AutoML works and lightweight models including 1.) Neural Architecture Search, 2.) Lightweight Structures, 3.) Model Compression, Quantization and Acceleration, 4.) Hyperparameter Optimization, 5.) Automated Feature Engineering.
Size: 150 KB - Last synced at: 4 days ago - Pushed at: about 4 years ago - Stars: 854 - Forks: 160

SforAiDl/KD_Lib
A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.
Language: Python - Size: 22.2 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 630 - Forks: 60

haitongli/knowledge-distillation-pytorch
A PyTorch implementation for exploring deep and shallow knowledge distillation (KD) experiments with flexibility
Language: Python - Size: 22.1 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 1,938 - Forks: 352

huawei-noah/Efficient-Computing
Efficient computing methods developed by Huawei Noah's Ark Lab
Language: Jupyter Notebook - Size: 100 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 1,273 - Forks: 218

he-y/Awesome-Pruning
A curated list of neural network pruning resources.
Size: 605 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 2,446 - Forks: 330

Xiuyu-Li/q-diffusion
[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.
Language: Python - Size: 5.97 MB - Last synced at: 30 days ago - Pushed at: over 1 year ago - Stars: 347 - Forks: 24

jim-schwoebel/allie
🤖 An automated machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers). Python 3.6 required.
Language: Python - Size: 275 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 141 - Forks: 35

1duo/awesome-ai-infrastructures
Infrastructures™ for Machine Learning Training/Inference in Production.
Size: 11.8 MB - Last synced at: about 1 month ago - Pushed at: about 6 years ago - Stars: 416 - Forks: 74

xuyang-liu16/GlobalCom2
Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models
Language: Python - Size: 6.23 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 21 - Forks: 0

microsoft/archai
Accelerate your Neural Architecture Search (NAS) through fast, reproducible and modular research.
Language: Python - Size: 48.3 MB - Last synced at: 5 days ago - Pushed at: 8 months ago - Stars: 477 - Forks: 89

he-y/filter-pruning-geometric-median
Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration (CVPR 2019 Oral)
Language: Python - Size: 2.17 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 614 - Forks: 114

microsoft/NeuronBlocks
NLP DNN Toolkit - Building Your NLP DNN Models Like Playing Lego
Language: Python - Size: 14.9 MB - Last synced at: 5 days ago - Pushed at: almost 2 years ago - Stars: 1,454 - Forks: 195

CASE-Lab-UMD/Unified-MoE-Compression
The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".
Language: Python - Size: 47.1 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 67 - Forks: 5

kssteven418/I-BERT
[ICML'21 Oral] I-BERT: Integer-only BERT Quantization
Language: Python - Size: 6.38 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 246 - Forks: 36

TCLResearchEurope/ptdeco
ptdeco is a library for model optimization by matrix decomposition built on top of PyTorch
Language: Python - Size: 324 KB - Last synced at: 16 days ago - Pushed at: about 2 months ago - Stars: 9 - Forks: 1

musco-ai/musco-pytorch
MUSCO: MUlti-Stage COmpression of neural networks
Language: Jupyter Notebook - Size: 681 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 72 - Forks: 16

he-y/soft-filter-pruning
Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks
Language: Python - Size: 59.6 KB - Last synced at: 30 days ago - Pushed at: over 5 years ago - Stars: 380 - Forks: 74

onnx/neural-compressor
Model compression for ONNX
Language: Python - Size: 2.35 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 92 - Forks: 9

HanXinzi-AI/awesome-computer-vision-resources
a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。
Size: 49.8 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 246 - Forks: 33

MingSun-Tse/Efficient-Deep-Learning
Collection of recent methods on (deep) neural network compression and acceleration.
Size: 700 KB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 945 - Forks: 131

microsoft/Moonlit
This is a collection of our research on efficient AI, covering hardware-aware NAS and model compression.
Language: Python - Size: 12 MB - Last synced at: 5 days ago - Pushed at: 8 months ago - Stars: 83 - Forks: 7

mit-han-lab/amc-models
[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile Devices
Language: Python - Size: 37.1 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 167 - Forks: 27

sujin-1013/task-aware-DMO
Task-Aware Dynamic Model Optimization for Multi-Task Learning (IEEE Access 2023)
Size: 1.47 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

mit-han-lab/amc
[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile Devices
Language: Python - Size: 17.6 KB - Last synced at: 23 days ago - Pushed at: over 1 year ago - Stars: 441 - Forks: 115

archsyscall/aquvitae
Knowledge Distillation Toolkit
Language: Python - Size: 170 MB - Last synced at: 26 days ago - Pushed at: almost 5 years ago - Stars: 88 - Forks: 10

ethanhe42/channel-pruning
Channel Pruning for Accelerating Very Deep Neural Networks (ICCV'17)
Language: Python - Size: 548 KB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 1,082 - Forks: 310

18520339/unstructured-local-search-pruning
Apply Simulated Annealing and Genetic Algorithm to solve the problem of Neural Network pruning without prior assumptions of weight importance
Language: Jupyter Notebook - Size: 2.28 MB - Last synced at: 2 days ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

minseok0809/awesome-ai-paper
A curated list of awesome NLP, Computer Vision, Model Compression, XAI, Reinforcement Learning, Security, etc Paper
Language: Jupyter Notebook - Size: 38.3 MB - Last synced at: 10 days ago - Pushed at: 3 months ago - Stars: 6 - Forks: 0

VainF/Diff-Pruning
[NeurIPS 2023] Structural Pruning for Diffusion Models
Language: Python - Size: 25.2 MB - Last synced at: 3 months ago - Pushed at: 12 months ago - Stars: 185 - Forks: 12

SKKU-ESLAB/Auto-Compression
Automatic DNN compression tool with various model compression and neural architecture search techniques
Language: C - Size: 106 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 21 - Forks: 18

vtsouval/FedCode
Communication-Efficient Federated Learning via Transferring Codebooks
Language: Python - Size: 338 KB - Last synced at: 29 days ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

ChanChiChoi/awesome-model-compression
papers about model compression
Size: 504 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 166 - Forks: 38

princeton-nlp/CoFiPruning
[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
Language: Python - Size: 1.79 MB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 195 - Forks: 32

Peterisfar/YOLOV3
yolov3 by pytorch
Language: Python - Size: 17.3 MB - Last synced at: 2 months ago - Pushed at: about 3 years ago - Stars: 195 - Forks: 53

Won-Seong/lightweight-resnet
Compressing ResNet50 with iterative pruning & distillation to maintain high accuracy on CIFAR-100.
Language: Python - Size: 115 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

VainF/Data-Free-Adversarial-Distillation
Code and pretrained models for paper: Data-Free Adversarial Distillation
Language: Python - Size: 1.53 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 96 - Forks: 18

CASE-Lab-UMD/LLM-Drop
The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
Language: Python - Size: 90.3 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 165 - Forks: 19

Stonesjtu/basis-embedding
basis embedding: a product quantization based model compression method for language models.
Language: Python - Size: 45.7 MB - Last synced at: about 18 hours ago - Pushed at: 8 months ago - Stars: 5 - Forks: 0

mlzxy/qsparse
Train neural networks with joint quantization and pruning on both weights and activations using any pytorch modules
Language: Python - Size: 293 KB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 41 - Forks: 2

lhyfst/knowledge-distillation-papers
knowledge distillation papers
Size: 321 KB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 753 - Forks: 87

bupt-ai-club/awesomeProject
分享高质量的AI项目
Language: Python - Size: 129 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 17 - Forks: 5

IPL-sharif/KD_Survey
A Comprehensive Survey on Knowledge Distillation
Size: 877 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 0

AIoT-MLSys-Lab/SVD-LLM
Official Code for "SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression"
Language: Python - Size: 744 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 135 - Forks: 10

JetRunner/BERT-of-Theseus
⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).
Language: Python - Size: 1.04 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 310 - Forks: 38

bloomberg/minilmv2.bb
Our open source implementation of MiniLMv2 (https://aclanthology.org/2021.findings-acl.188)
Language: Python - Size: 30.3 KB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 61 - Forks: 5

changwoolee/BLAST
[NeurIPS 2024] BLAST: Block Level Adaptive Structured Matrix for Efficient Deep Neural Network Inference
Language: Python - Size: 1.43 MB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 10 - Forks: 0

asahi417/lm-vocab-trimmer
Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting irrelevant tokens from its vocabulary. This repository contains a python-library vocabtrimmer, that remove irrelevant tokens from a multilingual LM vocabulary for the target language.
Language: Python - Size: 17.4 MB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 35 - Forks: 1

msadeqsirjani/adaptive_edge_ai
Optimizing deep learning models for edge devices through intelligent compression and knowledge distillation. Achieve up to 90% model size reduction while maintaining performance, enabling efficient AI deployment on resource-constrained devices.
Language: Python - Size: 395 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 2 - Forks: 0

ksm26/Quantization-in-Depth
Dive into advanced quantization techniques. Learn to implement and customize linear quantization functions, measure quantization error, and compress model weights using PyTorch for efficient and accessible AI models.
Language: Jupyter Notebook - Size: 5.79 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 5

czg1225/SlimSAM
[NeurIPS 2024] SlimSAM: 0.1% Data Makes Segment Anything Slim
Language: Python - Size: 36 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 323 - Forks: 17

jaicdev/QDPStudio
QDP Studio is a unified framework for deep learning model compression. It combines quantization, pruning, and decomposition to reduce model size, improve inference speed, and maintain accuracy. Its streamlined pipeline for training, compressing, and evaluating models optimizes deployments in resource-constrained environments.
Language: Python - Size: 35.2 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

bhllx/On-Efficient-Variants-of-Segment-Anything-Model
On Efficient Variants of Segment Anything Model
Size: 18.6 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

VITA-Group/SViTE
[NeurIPS'21] "Chasing Sparsity in Vision Transformers: An End-to-End Exploration" by Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang
Language: Python - Size: 615 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 89 - Forks: 12

chouaib-629/quantileRegression
Quantile regression for delivery time and some scenarios.
Language: Jupyter Notebook - Size: 589 KB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 2 - Forks: 0

DwangoMediaVillage/keras_compressor
Model Compression CLI Tool for Keras.
Language: Python - Size: 19.5 KB - Last synced at: 1 day ago - Pushed at: about 6 years ago - Stars: 156 - Forks: 37

r-papso/torch-optimizer
PyTorch models optimization by neural network pruning
Language: Python - Size: 55.3 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 2 - Forks: 1

LiyuanLucasLiu/LD-Net
Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling
Language: Python - Size: 599 KB - Last synced at: 2 months ago - Pushed at: over 5 years ago - Stars: 146 - Forks: 13

kssteven418/LTP
[KDD'22] Learned Token Pruning for Transformers
Language: Python - Size: 40.1 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 96 - Forks: 18

HKUDS/LightGNN
[WSDM'25] "LightGNN: Simple Graph Neural Network for Recommendation"
Language: Python - Size: 20.9 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 8 - Forks: 2

ksm26/Quantization-Fundamentals-with-Hugging-Face
Learn linear quantization techniques using the Quanto library and downcasting methods with the Transformers library to compress and optimize generative AI models effectively.
Language: Jupyter Notebook - Size: 205 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 9

kssteven418/Q-ASR
[ICASSP'22] Integer-only Zero-shot Quantization for Efficient Speech Recognition
Language: Jupyter Notebook - Size: 41.9 MB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 31 - Forks: 2
