An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: model-compression

Nike1-1/automl

⚙️ Automate machine learning tasks in-browser or with Node.js, utilizing efficient algorithms for regression and classification with minimal setup.

Language: JavaScript - Size: 3.62 MB - Last synced at: about 4 hours ago - Pushed at: about 6 hours ago - Stars: 0 - Forks: 0

FLHonker/Awesome-Knowledge-Distillation

Awesome Knowledge-Distillation. 分类整理的知识蒸馏paper(2014-2021)。

Size: 457 KB - Last synced at: about 2 hours ago - Pushed at: over 2 years ago - Stars: 2,634 - Forks: 336

Viswajith-Coder/automl

🤖 Automate machine learning tasks locally in the browser or on a server with this easy-to-use, zero-setup library for regression and classification.

Language: JavaScript - Size: 3.62 MB - Last synced at: about 18 hours ago - Pushed at: about 20 hours ago - Stars: 0 - Forks: 0

beingdhruvv/ImageSharpening-KD-Restormer-UNet

This repository features an image sharpening pipeline using Knowledge Distillation. A high-capacity Restormer acts as the teacher model, while a lightweight Mini-UNet is trained as the student to mimic its performance.

Language: Jupyter Notebook - Size: 3.76 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 2 - Forks: 1

suryaS2801/token-collection

🚀 Collect and manage tokens effortlessly with this simple, efficient framework for building and maintaining your own token collections.

Language: Go - Size: 1.7 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

xuyang-liu16/Awesome-Token-level-Model-Compression

📚 Collection of token-level model compression resources.

Size: 2.04 MB - Last synced at: 2 days ago - Pushed at: about 2 months ago - Stars: 176 - Forks: 7

rplacucci/compression-returns

Exploration of the relationship between intrinsic dimensionality and compressibility across tasks and architectures. Supports pruning, quantization, LoRA, and distillation using Hugging Face, with fine-tuning and evaluation on NLP and vision benchmarks.

Language: Python - Size: 31.3 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

guan-yuan/Awesome-AutoML-and-Lightweight-Models

A list of high-quality (newest) AutoML works and lightweight models including 1.) Neural Architecture Search, 2.) Lightweight Structures, 3.) Model Compression, Quantization and Acceleration, 4.) Hyperparameter Optimization, 5.) Automated Feature Engineering.

Size: 150 KB - Last synced at: 4 days ago - Pushed at: over 4 years ago - Stars: 853 - Forks: 158

dkozlov/awesome-knowledge-distillation

Awesome Knowledge Distillation

Size: 216 KB - Last synced at: 5 days ago - Pushed at: 10 days ago - Stars: 3,757 - Forks: 513

Efficient-ML/Awesome-Model-Quantization

A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

Size: 61.5 MB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 2,246 - Forks: 227

huawei-noah/Efficient-AI-Backbones

Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.

Language: Python - Size: 98.4 MB - Last synced at: 5 days ago - Pushed at: 8 months ago - Stars: 4,321 - Forks: 734

Jagatmohan46/tiny-recursive-model

🚀 Implement the Tiny Recursive Model (TRM) for improved performance in recursive tasks, building on the HRM framework by Sapient AI.

Language: Python - Size: 29.3 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

microsoft/nni 📦

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

Language: Python - Size: 127 MB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 14,287 - Forks: 1,834

chriscowncrow/TinyRecursiveModels

🔍 Explore recursive reasoning with TinyRecursiveModels, a compact 7M parameter neural network achieving high scores on tough tasks without massive resources.

Language: Python - Size: 1.19 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

666DZY666/micronet

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape

Language: Python - Size: 6.68 MB - Last synced at: 6 days ago - Pushed at: 6 months ago - Stars: 2,262 - Forks: 478

aminSa-21/automl

🤖 Simplify machine learning with automl—a local, browser-based tool for regression and classification using advanced algorithms like Decision Trees and Gradient Boosting.

Language: JavaScript - Size: 3.62 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

SqueezeAILab/SqueezeLLM

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

Language: Python - Size: 1.5 MB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 703 - Forks: 48

Picovoice/picollm

On-device LLM Inference Powered by X-Bit Quantization

Language: Python - Size: 98.1 MB - Last synced at: 3 days ago - Pushed at: 3 months ago - Stars: 271 - Forks: 14

datawhalechina/awesome-compression

模型压缩的小白入门教程,PDF下载地址 https://github.com/datawhalechina/awesome-compression/releases

Size: 311 MB - Last synced at: 6 days ago - Pushed at: 5 months ago - Stars: 334 - Forks: 36

CASE-Lab-UMD/Unified-MoE-Compression

The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".

Language: Python - Size: 47.1 MB - Last synced at: 11 days ago - Pushed at: 8 months ago - Stars: 79 - Forks: 5

VainF/Torch-Pruning

[CVPR 2023] DepGraph: Towards Any Structural Pruning; LLMs, Vision Foundation Models, etc.

Language: Python - Size: 10 MB - Last synced at: 11 days ago - Pushed at: about 2 months ago - Stars: 3,155 - Forks: 364

HanXinzi-AI/awesome-computer-vision-resources

a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。

Size: 49.8 MB - Last synced at: 12 days ago - Pushed at: over 1 year ago - Stars: 317 - Forks: 38

SqueezeAILab/KVQuant

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Language: Python - Size: 19.8 MB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 389 - Forks: 36

tensorflow/model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

Language: Python - Size: 2.23 MB - Last synced at: 5 days ago - Pushed at: 7 days ago - Stars: 1,557 - Forks: 334

alibaba/TinyNeuralNetwork

TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.

Language: Python - Size: 25.2 MB - Last synced at: 10 days ago - Pushed at: 2 months ago - Stars: 851 - Forks: 128

he-y/Awesome-Pruning

A curated list of neural network pruning resources.

Size: 605 KB - Last synced at: 16 days ago - Pushed at: over 1 year ago - Stars: 2,476 - Forks: 332

wangxb96/Awesome-EdgeAI

Resources of our survey paper "Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System Strategies"

Size: 3.64 MB - Last synced at: 11 days ago - Pushed at: about 2 months ago - Stars: 96 - Forks: 11

THU-MIG/torch-model-compression

针对pytorch模型的自动化模型结构分析和修改工具集,包含自动分析模型结构的模型压缩算法库

Language: Python - Size: 132 KB - Last synced at: 21 days ago - Pushed at: over 2 years ago - Stars: 254 - Forks: 41

cedrickchee/awesome-ml-model-compression

Awesome machine learning model compression research papers, quantization, tools, and learning material.

Size: 213 KB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 538 - Forks: 61

microsoft/Moonlit

This is a collection of our research on efficient AI, covering hardware-aware NAS and model compression.

Language: Python - Size: 12 MB - Last synced at: 13 days ago - Pushed at: about 1 year ago - Stars: 84 - Forks: 6

Efficient-ML/Awesome-Efficient-AIGC

A list of papers, docs, codes about efficient AIGC. This repo is aimed to provide the info for efficient AIGC research, including language and vision, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

Size: 63.5 KB - Last synced at: 16 days ago - Pushed at: 9 months ago - Stars: 195 - Forks: 11

microsoft/archai

Accelerate your Neural Architecture Search (NAS) through fast, reproducible and modular research.

Language: Python - Size: 48.3 MB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 481 - Forks: 92

1duo/awesome-ai-infrastructures

Infrastructures™ for Machine Learning Training/Inference in Production.

Size: 11.8 MB - Last synced at: 17 days ago - Pushed at: over 6 years ago - Stars: 427 - Forks: 75

microsoft/NeuronBlocks

NLP DNN Toolkit - Building Your NLP DNN Models Like Playing Lego

Language: Python - Size: 14.9 MB - Last synced at: 13 days ago - Pushed at: over 2 years ago - Stars: 1,453 - Forks: 193

minseok0809/awesome-ai-paper

A curated list of awesome NLP, Computer Vision, Model Compression, XAI, Reinforcement Learning, Security, etc Paper

Language: Jupyter Notebook - Size: 61 MB - Last synced at: 3 days ago - Pushed at: 2 months ago - Stars: 6 - Forks: 0

pratyushasharma/laser

The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction

Language: Python - Size: 2.25 MB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 388 - Forks: 35

ChanChiChoi/awesome-model-compression

papers about model compression

Size: 504 KB - Last synced at: 6 days ago - Pushed at: over 2 years ago - Stars: 165 - Forks: 37

hnuzhy/CV_DL_Gather

Gather research papers, corresponding codes (if having), reading notes and any other related materials about Hot🔥🔥🔥 fields in Computer Vision based on Deep Learning.

Size: 37.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 77 - Forks: 6

d0tTino/DeepThought-ReThought

A refactored version of the DeepThought Discord bot, focusing on improved architecture, performance, and AI agent capabilities.

Language: Python - Size: 20.7 MB - Last synced at: 14 days ago - Pushed at: 15 days ago - Stars: 1 - Forks: 1

StephenEkaputra/Sort-KD

Parameter-Free Logit Distillation via Sorting Mechanism

Language: Python - Size: 1.66 MB - Last synced at: 24 days ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

IPL-sharif/KD_Survey

A Comprehensive Survey on Knowledge Distillation

Size: 1.36 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 45 - Forks: 3

tianyic/only_train_once_personal_footprint

OTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured Pruning, Erasing Operators, CNN, Diffusion, LLM

Language: Python - Size: 2.94 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 310 - Forks: 49

czg1225/SlimSAM

[NeurIPS 2024] SlimSAM: 0.1% Data Makes Segment Anything Slim

Language: Python - Size: 36 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 345 - Forks: 18

ymoslem/Model-Compression

Code for the papers: "Efficient Speech Translation through Model Compression and Knowledge Distillation" and "Iterative Layer Pruning for Efficient Translation Inference"

Language: Jupyter Notebook - Size: 158 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

EnricoSimionato/Alternative-Model-Architectures

Research-oriented project focusing on implementing and evaluating novel compression techniques for large language models (LLMs).

Language: Python - Size: 19.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

diesimo-ai/TinyQ

A lightweight quantization module for PyTorch models.

Language: Python - Size: 368 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

zeyneddinoz/HybridCompFL

This repository includes simulation codes of the research work named "HybridCompFL: Model-Heterogeneous Federated Learning via Data-free Hybrid Model Compression"

Language: Python - Size: 517 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

MingSun-Tse/Why-the-State-of-Pruning-so-Confusing

[Preprint] Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Pruning

Size: 6.32 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 40 - Forks: 1

pvti/Awesome-Tensor-Decomposition

😎 A curated list of tensor decomposition resources for model compression.

Size: 646 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 81 - Forks: 11

AIoT-MLSys-Lab/SVD-LLM

[ICLR 2025🔥] SVD-LLM & [NAACL 2025🔥] SVD-LLM V2

Language: Python - Size: 1.04 MB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 241 - Forks: 27

vinhkhuc/JFastText

Java interface for fastText

Language: Java - Size: 57.6 KB - Last synced at: 6 days ago - Pushed at: over 2 years ago - Stars: 244 - Forks: 99

horseee/DeepCache

[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free

Language: Python - Size: 102 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 927 - Forks: 47

mlzxy/qsparse

Train neural networks with joint quantization and pruning on both weights and activations using any pytorch modules

Language: Python - Size: 293 KB - Last synced at: about 2 months ago - Pushed at: about 3 years ago - Stars: 43 - Forks: 2

Bit-by-Bit-Collective/BitNet-7B-KDE

Colab-friendly BitNet distillation engine: collect KD traces from a teacher, train a ternary Mini-BitNet, and dry-run 7B memory. Multi-provider + Drive/S3

Language: Python - Size: 268 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

NVlabs/condensa

Programmable Neural Network Compression

Language: Python - Size: 16.2 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 149 - Forks: 26

Blue-No1/quantization-experiments

Experiments on quantization for open-weight LLMs — balancing memory footprint, speed, and accuracy.

Size: 9.77 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

SforAiDl/KD_Lib

A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.

Language: Python - Size: 22.2 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 642 - Forks: 62

kssteven418/LTP

[KDD'22] Learned Token Pruning for Transformers

Language: Python - Size: 40.1 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 99 - Forks: 18

ismail31416/LumiNet

The official (TMLR) implementation of LumiNet: Perception-Driven Knowledge Distillation via Statistical Logit Calibration

Language: Jupyter Notebook - Size: 14.3 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 18 - Forks: 3

xuyang-liu16/GlobalCom2

🚀 Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models

Language: Python - Size: 6.24 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 31 - Forks: 1

gershonc/octopus-ml

A collection of handy ML and data visualization and validation tools. Go ahead and train, evaluate and validate your ML models and data with minimal effort.

Language: Jupyter Notebook - Size: 21.4 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 22 - Forks: 5

Inpyo-Hong/Model-Compression-Paper-List

Model Compression Paper List (Focusing on Quantization, Particularly Zero-Shot Quantization)

Size: 66.4 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

linkedin/QuantEase

QuantEase, a layer-wise quantization framework, frames the problem as discrete-structured non-convex optimization. Our work leverages Coordinate Descent techniques, offering high-quality solutions without the need for matrix inversion or decomposition.

Language: Python - Size: 209 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 18 - Forks: 2

erectbranch/Awesome-Activation-Sparsification

A curated list of neural network activation sparsification resources.

Size: 2.93 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

sfarrukhm/making_models_efficient

Developing efficient deep learning models for real-world use. Covers knowledge distillation, quantization, pruning, and more. Focused on reducing size and latency while preserving accuracy. Includes training pipelines, visualizations, and performance reports.

Language: Jupyter Notebook - Size: 128 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

bupt-ai-club/awesomeProject

分享高质量的AI项目

Language: Python - Size: 129 MB - Last synced at: 12 days ago - Pushed at: over 1 year ago - Stars: 21 - Forks: 5

r-papso/torch-optimizer

PyTorch models optimization by neural network pruning

Language: Python - Size: 55.3 MB - Last synced at: 26 days ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 1

wlfeng0509/Awesome-Diffusion-Quantization

A list of papers, docs, codes about diffusion quantization.This repo collects various quantization methods for the Diffusion Models. Welcome to PR the works (papers, repositories) missed by the repo.

Size: 7.81 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 7 - Forks: 0

whucs21Mzy/Model-Phase-Transitions

Phase Transitions in Large Language Model Compression: A Perspective

Size: 5.83 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 52 - Forks: 1

zhang-fengdi/ControlGS

Official reference implementation of "Consistent Quantity-Quality Control across Scenes for Deployment-Aware Gaussian Splatting"

Language: C++ - Size: 15.1 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 11 - Forks: 0

wlfeng0509/Awesome-Diffusion-Distillation

A list of papers, docs, codes about diffusion distillation.This repo collects various distillation methods for the Diffusion model. Welcome to PR the works (papers, repositories) missed by the repo.

Size: 1000 Bytes - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 36 - Forks: 1

thu-nics/MoA

[CoLM'25] The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>

Language: Python - Size: 544 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 140 - Forks: 8

khodoba/Code-in-Paper-Guide

Learn how to insert code into your LaTeX papers using the hyperref package. Enhance your documents with clear links and references. 🌟💻

Size: 19.5 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

merantix-momentum/acip

🗜️Codebase of the ACIP algorithm 🗜️

Language: Python - Size: 234 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 10 - Forks: 1

eezkni/SVRF

[TIP-2025] Official Pytorch implementation of "Shell-guided Compression of Voxel Radiance Fields"

Language: Python - Size: 1.18 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 4 - Forks: 0

suraj5424/Traffic_sign_Classification_EdgeAI_Pruning_Quantization

🚦 Classifies German traffic signs using a compact CNN 🧠. Combines pruning and quantization ⚙️ for Edge AI 🤖. Delivers high accuracy 📊, low latency ⚡, and efficient performance 💾.

Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

dwiaskor99/contrastive-distillation

CAST is a method for semi-supervised instance segmentation that efficiently trains a compact model using both labeled and unlabeled data. This repository contains the implementation of our three-stage pipeline, showcasing contrastive adaptation and distillation techniques. 🐙🌟

Size: 3.2 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

CASE-Lab-UMD/LLM-Drop

The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".

Language: Python - Size: 90.3 MB - Last synced at: 2 months ago - Pushed at: 7 months ago - Stars: 174 - Forks: 22

bhllx/On-Efficient-Variants-of-Segment-Anything-Model

On Efficient Variants of Segment Anything Model

Size: 35.2 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

lpalbou/model-quantizer

Effortlessly quantize, benchmark, and publish Hugging Face models with cross-platform support for CPU/GPU. Reduce model size by 75% while maintaining performance.

Language: Python - Size: 165 KB - Last synced at: 6 days ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

sarosh-quraishi/tensorslim

Neural network compression using randomized SVD

Language: Python - Size: 89.8 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

VainF/Diff-Pruning

[NeurIPS 2023] Structural Pruning for Diffusion Models

Language: Python - Size: 25.2 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 198 - Forks: 14

huawei-noah/Pretrained-Language-Model

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

Language: Python - Size: 29 MB - Last synced at: 4 months ago - Pushed at: almost 2 years ago - Stars: 3,103 - Forks: 637

ksm26/Quantization-in-Depth

Dive into advanced quantization techniques. Learn to implement and customize linear quantization functions, measure quantization error, and compress model weights using PyTorch for efficient and accessible AI models.

Language: Jupyter Notebook - Size: 5.79 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 5

Sharpiless/Yolov5-distillation-train-inference

Yolov5 distillation training | Yolov5知识蒸馏训练,支持训练自己的数据

Language: Python - Size: 2.36 MB - Last synced at: 4 months ago - Pushed at: about 3 years ago - Stars: 221 - Forks: 32

deadlykitten4/ResSVD

ResSVD: Residual Compensated SVD for Large Language Model Compression

Size: 9.77 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

ardaerendogru/dinov2_distillation

This project implements knowledge distillation from DINOv2 (Vision Transformer) to convolutional networks, enabling efficient visual representation learning with reduced computational requirements.

Language: Python - Size: 92.8 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 5 - Forks: 0

onnx/neural-compressor

Model compression for ONNX

Language: Python - Size: 2.35 MB - Last synced at: 3 months ago - Pushed at: 12 months ago - Stars: 96 - Forks: 9

vanhai1231/autoquant-infer

Công cụ giảm kích thước mô hình bằng Quantization, kết hợp AI Agent để tự động chọn mức tối ưu, giúp tăng tốc và tiết kiệm chi phí inference.

Language: Python - Size: 54.7 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

BaiTheBest/SparseLLM

Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)

Language: Python - Size: 145 KB - Last synced at: 5 months ago - Pushed at: 7 months ago - Stars: 61 - Forks: 9

Tencent/PocketFlow

An Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications.

Language: Python - Size: 1.13 MB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 2,884 - Forks: 490

haitongli/knowledge-distillation-pytorch

A PyTorch implementation for exploring deep and shallow knowledge distillation (KD) experiments with flexibility

Language: Python - Size: 22.1 MB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 1,938 - Forks: 352

huawei-noah/Efficient-Computing

Efficient computing methods developed by Huawei Noah's Ark Lab

Language: Jupyter Notebook - Size: 100 MB - Last synced at: 5 months ago - Pushed at: 12 months ago - Stars: 1,273 - Forks: 218

VITA-Group/ATMC

[NeurIPS'2019] Shupeng Gui, Haotao Wang, Haichuan Yang, Chen Yu, Zhangyang Wang, Ji Liu, “Model Compression with Adversarial Robustness: A Unified Optimization Framework”

Language: Python - Size: 50.6 MB - Last synced at: 3 months ago - Pushed at: almost 4 years ago - Stars: 50 - Forks: 10

HKUDS/LightGNN

[WSDM'25] "LightGNN: Simple Graph Neural Network for Recommendation"

Language: Python - Size: 20.9 MB - Last synced at: 4 months ago - Pushed at: 10 months ago - Stars: 48 - Forks: 4

Xiuyu-Li/q-diffusion

[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.

Language: Python - Size: 5.97 MB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 347 - Forks: 24

jim-schwoebel/allie

🤖 An automated machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers). Python 3.6 required.

Language: Python - Size: 275 MB - Last synced at: 5 months ago - Pushed at: 7 months ago - Stars: 141 - Forks: 35

he-y/filter-pruning-geometric-median

Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration (CVPR 2019 Oral)

Language: Python - Size: 2.17 MB - Last synced at: 5 months ago - Pushed at: about 2 years ago - Stars: 614 - Forks: 114

msadeqsirjani/adaptive_edge_ai

Optimizing deep learning models for edge devices through intelligent compression and knowledge distillation. Achieve up to 90% model size reduction while maintaining performance, enabling efficient AI deployment on resource-constrained devices.

Language: Python - Size: 395 KB - Last synced at: 3 days ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

kssteven418/I-BERT

[ICML'21 Oral] I-BERT: Integer-only BERT Quantization

Language: Python - Size: 6.38 MB - Last synced at: 5 months ago - Pushed at: almost 3 years ago - Stars: 246 - Forks: 36

Related Keywords
model-compression 291 deep-learning 87 pytorch 68 pruning 65 quantization 56 knowledge-distillation 51 machine-learning 38 python 23 computer-vision 22 distillation 17 deep-neural-networks 16 large-language-models 16 tensorflow 15 model-pruning 15 model-acceleration 14 natural-language-processing 14 llm 13 efficient-deep-learning 13 nlp 12 neural-architecture-search 12 bert 12 automl 12 neural-network 12 network-pruning 11 convolutional-neural-networks 10 efficient-inference 10 transformers 10 data-science 10 compression 10 model-optimization 9 channel-pruning 9 awesome-list 8 optimization 8 keras 8 neural-networks 8 transformer 8 nas 8 object-detection 8 hyperparameter-optimization 7 diffusion-models 7 sparsity 7 model-quantization 7 neural-network-pruning 7 language-model 7 knowledge-transfer 6 awesome 6 sparsification 6 quantization-aware-training 6 ai 6 federated-learning 6 efficient-model 6 kd 6 cnn 6 feature-engineering 5 scikit-learn 5 artificial-intelligence 5 efficient-neural-networks 5 image-classification 5 post-training-quantization 5 edge-computing 5 structured-pruning 5 edge-ai 5 llama 5 neural-network-compression 5 filter-pruning 5 unstructured-pruning 4 weight-pruning 4 open-source 4 inference 4 text-classification 4 teacher-student 4 model-distillation 4 vision-transformer 4 papers 4 data-visualization 4 binary-neural-networks 4 super-resolution 4 classification 4 natural-language-understanding 4 efficientnet 4 onnx 4 model-deployment 4 transfer-learning 4 mlops 4 generative-ai 4 coco 3 automated-machine-learning 3 vision-transformers 3 recurrent-neural-networks 3 domain-adaptation 3 tensorrt 3 eda 3 mixed-precision 3 model-evaluation 3 feature-selection 3 stable-diffusion 3 quantized-neural-networks 3 automated-feature-engineering 3 huggingface 3 ensemble-learning 3