GitHub topics: model-compression

Repositories

Picovoice/picollm

On-device LLM Inference Powered by X-Bit Quantization

Language: Python - Size: 98 MB - Last synced at: 1 day ago - Pushed at: 13 days ago - Stars: 250 - Forks: 14

huawei-noah/Pretrained-Language-Model

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

Language: Python - Size: 29 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 3,103 - Forks: 637

EnricoSimionato/Alternative-Model-Architectures

Research-oriented project focusing on implementing and evaluating novel compression techniques for large language models (LLMs).

Language: Python - Size: 19.7 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

tensorflow/model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

Language: Python - Size: 2.22 MB - Last synced at: 4 days ago - Pushed at: 4 months ago - Stars: 1,534 - Forks: 327

dwiaskor99/contrastive-distillation

CAST is a method for semi-supervised instance segmentation that efficiently trains a compact model using both labeled and unlabeled data. This repository contains the implementation of our three-stage pipeline, showcasing contrastive adaptation and distillation techniques. 🐙🌟

Size: 3.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

microsoft/nni 📦

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

Language: Python - Size: 127 MB - Last synced at: 5 days ago - Pushed at: 12 months ago - Stars: 14,205 - Forks: 1,822

pvti/Awesome-Tensor-Decomposition

😎 A curated list of tensor decomposition resources for model compression.

Size: 488 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 68 - Forks: 8

lpalbou/model-quantizer

Effortlessly quantize, benchmark, and publish Hugging Face models with cross-platform support for CPU/GPU. Reduce model size by 75% while maintaining performance.

Language: Python - Size: 165 KB - Last synced at: 3 days ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

SqueezeAILab/KVQuant

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Language: Python - Size: 19.8 MB - Last synced at: 1 day ago - Pushed at: 11 months ago - Stars: 359 - Forks: 31

dkozlov/awesome-knowledge-distillation

Awesome Knowledge Distillation

Size: 215 KB - Last synced at: 6 days ago - Pushed at: 14 days ago - Stars: 3,686 - Forks: 512

d0tTino/DeepThought-ReThought

A refactored version of the DeepThought Discord bot, focusing on improved architecture, performance, and AI agent capabilities.

Language: Python - Size: 18.6 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

eezkni/SVRF

[TIP-2025] Pytorch implementation of "Shell-guided Compression of Voxel Radiance Fields"

Language: Python - Size: 1.17 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 4 - Forks: 0

FLHonker/Awesome-Knowledge-Distillation

Awesome Knowledge-Distillation. 分类整理的知识蒸馏paper(2014-2021)。

Size: 457 KB - Last synced at: 7 days ago - Pushed at: about 2 years ago - Stars: 2,601 - Forks: 338

datawhalechina/awesome-compression

模型压缩的小白入门教程，PDF下载地址 https://github.com/datawhalechina/awesome-compression/releases

Size: 311 MB - Last synced at: 6 days ago - Pushed at: 11 days ago - Stars: 295 - Forks: 36

zhang-fengdi/ControlGS

Official reference implementation of "Consistent Quantity-Quality Control across Scenes for Deployment-Aware Gaussian Splatting"

Language: C++ - Size: 15.1 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 6 - Forks: 0

deadlykitten4/ResSVD

ResSVD: Residual Compensated SVD for Large Language Model Compression

Size: 9.77 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

pratyushasharma/laser

The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction

Language: Python - Size: 2.25 MB - Last synced at: 1 day ago - Pushed at: 12 months ago - Stars: 388 - Forks: 34

Efficient-ML/Awesome-Model-Quantization

A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

Size: 61.5 MB - Last synced at: 11 days ago - Pushed at: 4 months ago - Stars: 2,131 - Forks: 224

SqueezeAILab/SqueezeLLM

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

Language: Python - Size: 1.5 MB - Last synced at: 1 day ago - Pushed at: 11 months ago - Stars: 692 - Forks: 46

ardaerendogru/dinov2_distillation

This project implements knowledge distillation from DINOv2 (Vision Transformer) to convolutional networks, enabling efficient visual representation learning with reduced computational requirements.

Language: Python - Size: 92.8 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 5 - Forks: 0

VainF/Torch-Pruning

[CVPR 2023] DepGraph: Towards Any Structural Pruning; LLMs, Vision Foundation Models, etc.

Language: Python - Size: 10 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 3,038 - Forks: 351

xuyang-liu16/Awesome-Token-level-Model-Compression

📚 Collection of token-level model compression resources.

Size: 1.72 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 117 - Forks: 4

huawei-noah/Efficient-AI-Backbones

Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.

Language: Python - Size: 98.4 MB - Last synced at: 12 days ago - Pushed at: 3 months ago - Stars: 4,240 - Forks: 723

666DZY666/micronet

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape

Language: Python - Size: 6.68 MB - Last synced at: 16 days ago - Pushed at: about 2 months ago - Stars: 2,250 - Forks: 476

Inpyo-Hong/Model-Compression-Paper-List

Model Compression Paper List (Focusing on Quantization, Particularly Zero-Shot Quantization)

Size: 42 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 1 - Forks: 0

Efficient-ML/Awesome-Efficient-AIGC

A list of papers, docs, codes about efficient AIGC. This repo is aimed to provide the info for efficient AIGC research, including language and vision, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

Size: 63.5 KB - Last synced at: 11 days ago - Pushed at: 4 months ago - Stars: 183 - Forks: 11

merantix-momentum/acip

🗜️Codebase of the ACIP algorithm 🗜️

Language: Python - Size: 259 KB - Last synced at: 9 days ago - Pushed at: 15 days ago - Stars: 9 - Forks: 0

THU-MIG/torch-model-compression

针对pytorch模型的自动化模型结构分析和修改工具集，包含自动分析模型结构的模型压缩算法库

Language: Python - Size: 132 KB - Last synced at: about 9 hours ago - Pushed at: about 2 years ago - Stars: 249 - Forks: 41

vanhai1231/autoquant-infer

Công cụ giảm kích thước mô hình bằng Quantization, kết hợp AI Agent để tự động chọn mức tối ưu, giúp tăng tốc và tiết kiệm chi phí inference.

Language: Python - Size: 54.7 KB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

hnuzhy/CV_DL_Gather

Gather research papers, corresponding codes (if having), reading notes and any other related materials about Hot🔥🔥🔥 fields in Computer Vision based on Deep Learning.

Size: 37.6 MB - Last synced at: 21 days ago - Pushed at: 22 days ago - Stars: 74 - Forks: 6

BaiTheBest/SparseLLM

Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)

Language: Python - Size: 145 KB - Last synced at: 11 days ago - Pushed at: 3 months ago - Stars: 61 - Forks: 9

Sharpiless/Yolov5-distillation-train-inference

Yolov5 distillation training | Yolov5知识蒸馏训练，支持训练自己的数据

Language: Python - Size: 2.36 MB - Last synced at: 16 days ago - Pushed at: over 2 years ago - Stars: 220 - Forks: 33

wangxb96/Awesome-EdgeAI

Resources of our survey paper "Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System Strategies"

Size: 3.64 MB - Last synced at: 11 days ago - Pushed at: 6 months ago - Stars: 87 - Forks: 8

Tencent/PocketFlow

An Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications.

Language: Python - Size: 1.13 MB - Last synced at: 27 days ago - Pushed at: about 2 years ago - Stars: 2,884 - Forks: 490

alibaba/TinyNeuralNetwork

TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.

Language: Python - Size: 25.1 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 825 - Forks: 126

vinhkhuc/JFastText

Java interface for fastText

Language: Java - Size: 57.6 KB - Last synced at: 7 days ago - Pushed at: about 2 years ago - Stars: 237 - Forks: 98

horseee/DeepCache

[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free

Language: Python - Size: 102 MB - Last synced at: 27 days ago - Pushed at: 12 months ago - Stars: 893 - Forks: 43

cedrickchee/awesome-ml-model-compression

Awesome machine learning model compression research papers, quantization, tools, and learning material.

Size: 213 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 523 - Forks: 60

tianyic/only_train_once_personal_footprint

OTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured Pruning, Erasing Operators, CNN, Diffusion, LLM

Language: Python - Size: 2.94 MB - Last synced at: 19 days ago - Pushed at: 9 months ago - Stars: 302 - Forks: 48

guan-yuan/Awesome-AutoML-and-Lightweight-Models

A list of high-quality (newest) AutoML works and lightweight models including 1.) Neural Architecture Search, 2.) Lightweight Structures, 3.) Model Compression, Quantization and Acceleration, 4.) Hyperparameter Optimization, 5.) Automated Feature Engineering.

Size: 150 KB - Last synced at: 4 days ago - Pushed at: about 4 years ago - Stars: 854 - Forks: 160

SforAiDl/KD_Lib

A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.

Language: Python - Size: 22.2 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 630 - Forks: 60

haitongli/knowledge-distillation-pytorch

A PyTorch implementation for exploring deep and shallow knowledge distillation (KD) experiments with flexibility

Language: Python - Size: 22.1 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 1,938 - Forks: 352

huawei-noah/Efficient-Computing

Efficient computing methods developed by Huawei Noah's Ark Lab

Language: Jupyter Notebook - Size: 100 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 1,273 - Forks: 218

he-y/Awesome-Pruning

A curated list of neural network pruning resources.

Size: 605 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 2,446 - Forks: 330

Xiuyu-Li/q-diffusion

[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.

Language: Python - Size: 5.97 MB - Last synced at: 30 days ago - Pushed at: over 1 year ago - Stars: 347 - Forks: 24

jim-schwoebel/allie

🤖 An automated machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers). Python 3.6 required.

Language: Python - Size: 275 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 141 - Forks: 35

1duo/awesome-ai-infrastructures

Infrastructures™ for Machine Learning Training/Inference in Production.

Size: 11.8 MB - Last synced at: about 1 month ago - Pushed at: about 6 years ago - Stars: 416 - Forks: 74

xuyang-liu16/GlobalCom2

Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models

Language: Python - Size: 6.23 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 21 - Forks: 0

microsoft/archai

Accelerate your Neural Architecture Search (NAS) through fast, reproducible and modular research.

Language: Python - Size: 48.3 MB - Last synced at: 5 days ago - Pushed at: 8 months ago - Stars: 477 - Forks: 89

he-y/filter-pruning-geometric-median

Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration (CVPR 2019 Oral)

Language: Python - Size: 2.17 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 614 - Forks: 114

microsoft/NeuronBlocks

NLP DNN Toolkit - Building Your NLP DNN Models Like Playing Lego

Language: Python - Size: 14.9 MB - Last synced at: 5 days ago - Pushed at: almost 2 years ago - Stars: 1,454 - Forks: 195

CASE-Lab-UMD/Unified-MoE-Compression

The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".

Language: Python - Size: 47.1 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 67 - Forks: 5

kssteven418/I-BERT

[ICML'21 Oral] I-BERT: Integer-only BERT Quantization

Language: Python - Size: 6.38 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 246 - Forks: 36

TCLResearchEurope/ptdeco

ptdeco is a library for model optimization by matrix decomposition built on top of PyTorch

Language: Python - Size: 324 KB - Last synced at: 16 days ago - Pushed at: about 2 months ago - Stars: 9 - Forks: 1

musco-ai/musco-pytorch

MUSCO: MUlti-Stage COmpression of neural networks

Language: Jupyter Notebook - Size: 681 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 72 - Forks: 16

he-y/soft-filter-pruning

Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks

Language: Python - Size: 59.6 KB - Last synced at: 30 days ago - Pushed at: over 5 years ago - Stars: 380 - Forks: 74

onnx/neural-compressor

Model compression for ONNX

Language: Python - Size: 2.35 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 92 - Forks: 9

HanXinzi-AI/awesome-computer-vision-resources

a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。

Size: 49.8 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 246 - Forks: 33

MingSun-Tse/Efficient-Deep-Learning

Collection of recent methods on (deep) neural network compression and acceleration.

Size: 700 KB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 945 - Forks: 131

microsoft/Moonlit

This is a collection of our research on efficient AI, covering hardware-aware NAS and model compression.

Language: Python - Size: 12 MB - Last synced at: 5 days ago - Pushed at: 8 months ago - Stars: 83 - Forks: 7

mit-han-lab/amc-models

[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile Devices

Language: Python - Size: 37.1 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 167 - Forks: 27

sujin-1013/task-aware-DMO

Task-Aware Dynamic Model Optimization for Multi-Task Learning (IEEE Access 2023)

Size: 1.47 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

mit-han-lab/amc

[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile Devices

Language: Python - Size: 17.6 KB - Last synced at: 23 days ago - Pushed at: over 1 year ago - Stars: 441 - Forks: 115

archsyscall/aquvitae

Knowledge Distillation Toolkit

Language: Python - Size: 170 MB - Last synced at: 26 days ago - Pushed at: almost 5 years ago - Stars: 88 - Forks: 10

ethanhe42/channel-pruning

Channel Pruning for Accelerating Very Deep Neural Networks (ICCV'17)

Language: Python - Size: 548 KB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 1,082 - Forks: 310

18520339/unstructured-local-search-pruning

Apply Simulated Annealing and Genetic Algorithm to solve the problem of Neural Network pruning without prior assumptions of weight importance

Language: Jupyter Notebook - Size: 2.28 MB - Last synced at: 2 days ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

minseok0809/awesome-ai-paper

A curated list of awesome NLP, Computer Vision, Model Compression, XAI, Reinforcement Learning, Security, etc Paper

Language: Jupyter Notebook - Size: 38.3 MB - Last synced at: 10 days ago - Pushed at: 3 months ago - Stars: 6 - Forks: 0

VainF/Diff-Pruning

[NeurIPS 2023] Structural Pruning for Diffusion Models

Language: Python - Size: 25.2 MB - Last synced at: 3 months ago - Pushed at: 12 months ago - Stars: 185 - Forks: 12

SKKU-ESLAB/Auto-Compression

Automatic DNN compression tool with various model compression and neural architecture search techniques

Language: C - Size: 106 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 21 - Forks: 18

vtsouval/FedCode

Communication-Efficient Federated Learning via Transferring Codebooks

Language: Python - Size: 338 KB - Last synced at: 29 days ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

ChanChiChoi/awesome-model-compression

papers about model compression

Size: 504 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 166 - Forks: 38

princeton-nlp/CoFiPruning

[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408

Language: Python - Size: 1.79 MB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 195 - Forks: 32

Peterisfar/YOLOV3

yolov3 by pytorch

Language: Python - Size: 17.3 MB - Last synced at: 2 months ago - Pushed at: about 3 years ago - Stars: 195 - Forks: 53

Won-Seong/lightweight-resnet

Compressing ResNet50 with iterative pruning & distillation to maintain high accuracy on CIFAR-100.

Language: Python - Size: 115 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

VainF/Data-Free-Adversarial-Distillation

Code and pretrained models for paper: Data-Free Adversarial Distillation

Language: Python - Size: 1.53 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 96 - Forks: 18

CASE-Lab-UMD/LLM-Drop

The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".

Language: Python - Size: 90.3 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 165 - Forks: 19

Stonesjtu/basis-embedding

basis embedding: a product quantization based model compression method for language models.

Language: Python - Size: 45.7 MB - Last synced at: about 18 hours ago - Pushed at: 8 months ago - Stars: 5 - Forks: 0

mlzxy/qsparse

Train neural networks with joint quantization and pruning on both weights and activations using any pytorch modules

Language: Python - Size: 293 KB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 41 - Forks: 2

lhyfst/knowledge-distillation-papers

knowledge distillation papers

Size: 321 KB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 753 - Forks: 87

bupt-ai-club/awesomeProject

分享高质量的AI项目

Language: Python - Size: 129 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 17 - Forks: 5

IPL-sharif/KD_Survey

A Comprehensive Survey on Knowledge Distillation

Size: 877 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 0

AIoT-MLSys-Lab/SVD-LLM

Official Code for "SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression"

Language: Python - Size: 744 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 135 - Forks: 10

JetRunner/BERT-of-Theseus

⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).

Language: Python - Size: 1.04 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 310 - Forks: 38

bloomberg/minilmv2.bb

Our open source implementation of MiniLMv2 (https://aclanthology.org/2021.findings-acl.188)

Language: Python - Size: 30.3 KB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 61 - Forks: 5

changwoolee/BLAST

[NeurIPS 2024] BLAST: Block Level Adaptive Structured Matrix for Efficient Deep Neural Network Inference

Language: Python - Size: 1.43 MB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 10 - Forks: 0

asahi417/lm-vocab-trimmer

Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting irrelevant tokens from its vocabulary. This repository contains a python-library vocabtrimmer, that remove irrelevant tokens from a multilingual LM vocabulary for the target language.

Language: Python - Size: 17.4 MB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 35 - Forks: 1

msadeqsirjani/adaptive_edge_ai

Optimizing deep learning models for edge devices through intelligent compression and knowledge distillation. Achieve up to 90% model size reduction while maintaining performance, enabling efficient AI deployment on resource-constrained devices.

Language: Python - Size: 395 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 2 - Forks: 0

ksm26/Quantization-in-Depth

Dive into advanced quantization techniques. Learn to implement and customize linear quantization functions, measure quantization error, and compress model weights using PyTorch for efficient and accessible AI models.

Language: Jupyter Notebook - Size: 5.79 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 5

czg1225/SlimSAM

[NeurIPS 2024] SlimSAM: 0.1% Data Makes Segment Anything Slim

Language: Python - Size: 36 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 323 - Forks: 17

jaicdev/QDPStudio

QDP Studio is a unified framework for deep learning model compression. It combines quantization, pruning, and decomposition to reduce model size, improve inference speed, and maintain accuracy. Its streamlined pipeline for training, compressing, and evaluating models optimizes deployments in resource-constrained environments.

Language: Python - Size: 35.2 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Related Keywords

model-compression 270 deep-learning 81 pytorch 63 pruning 62 quantization 49 knowledge-distillation 47 machine-learning 32 python 19 computer-vision 18 deep-neural-networks 16 distillation 16 model-pruning 15 large-language-models 15 tensorflow 14 natural-language-processing 14 model-acceleration 13 efficient-deep-learning 12 bert 12 neural-architecture-search 12 neural-network 11 llm 11 network-pruning 11 nlp 11 automl 10 convolutional-neural-networks 10 efficient-inference 10 compression 9 awesome-list 8 model-optimization 8 channel-pruning 8 transformers 8 keras 7 transformer 7 neural-network-pruning 7 language-model 7 knowledge-transfer 6 kd 6 data-science 6 efficient-model 6 model-quantization 6 diffusion-models 6 neural-networks 6 object-detection 6 quantization-aware-training 6 cnn 6 sparsity 6 nas 5 optimization 5 hyperparameter-optimization 5 artificial-intelligence 5 llama 5 image-classification 5 efficient-neural-networks 5 post-training-quantization 5 filter-pruning 5 neural-network-compression 5 federated-learning 5 structured-pruning 5 onnx 4 papers 4 unstructured-pruning 4 super-resolution 4 vision-transformer 4 edge-computing 4 model-distillation 4 teacher-student 4 ai 4 model-deployment 4 text-classification 4 natural-language-understanding 4 transfer-learning 4 awesome 4 data-visualization 4 binary-neural-networks 4 generative-ai 4 feature-engineering 4 weight-pruning 4 sparsification 4 classification 3 eda 3 micronet-challenge 3 dnn 3 edge-ai 3 data-free 3 model-comparison 3 vision-transformers 3 face-recognition 3 mlops 3 llms 3 domain-adaptation 3 acceleration 3 quantized-neural-networks 3 speech 3 svd 3 neurips-2019 3 inference 3 ensemble-learning 3 recurrent-neural-networks 3 benchmark 3 tensorrt 3