An open API service providing repository metadata for many open source software ecosystems.

Topic: "post-training-quantization"

intel/neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Language: Python - Size: 469 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2,380 - Forks: 267

666DZY666/micronet

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape

Language: Python - Size: 6.58 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 2,239 - Forks: 476

alibaba/TinyNeuralNetwork

TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.

Language: Python - Size: 25.2 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 809 - Forks: 122

SqueezeAILab/SqueezeLLM

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

Language: Python - Size: 1.5 MB - Last synced at: 14 days ago - Pushed at: 9 months ago - Stars: 685 - Forks: 45

ModelTC/llmc

[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

Language: Python - Size: 28.9 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 456 - Forks: 53

Xiuyu-Li/q-diffusion

[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.

Language: Python - Size: 5.97 MB - Last synced at: 21 days ago - Pushed at: about 1 year ago - Stars: 347 - Forks: 24

megvii-research/FQ-ViT

[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer

Language: Python - Size: 729 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 329 - Forks: 48

megvii-research/Sparsebit

A model compression and acceleration toolbox based on pytorch.

Language: Python - Size: 7.45 MB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 327 - Forks: 40

sayakpaul/Adventures-in-TensorFlow-Lite

This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.

Language: Jupyter Notebook - Size: 49.1 MB - Last synced at: 20 days ago - Pushed at: over 2 years ago - Stars: 172 - Forks: 35

Hsu1023/DuQuant

[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.

Language: Python - Size: 2.1 MB - Last synced at: 5 months ago - Pushed at: 7 months ago - Stars: 111 - Forks: 7

hkproj/quantization-notes

Notes on quantization in neural networks

Language: Jupyter Notebook - Size: 940 KB - Last synced at: 28 days ago - Pushed at: over 1 year ago - Stars: 78 - Forks: 16

ModelTC/TFMQ-DM

[CVPR 2024 Highlight] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models".

Language: Jupyter Notebook - Size: 118 MB - Last synced at: 23 days ago - Pushed at: 9 months ago - Stars: 61 - Forks: 4

ModelTC/QLLM

[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models"

Language: Python - Size: 1.68 MB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 35 - Forks: 4

Sanjana7395/static_quantization

Post-training static quantization using ResNet18 architecture

Language: Jupyter Notebook - Size: 85.9 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 28 - Forks: 7

KwangHoonAn/Quantizations

Language: Python - Size: 12.3 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 12 - Forks: 3

zysxmu/FDDA

Pytorch implementation of our paper accepted by ECCV 2022-- Fine-grained Data Distribution Alignment for Post-Training Quantization

Language: Python - Size: 2.01 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 11 - Forks: 0

shieldforever/NeuronQuant

[ASP-DAC 2025] "NeuronQuant: Accurate and Efficient Post-Training Quantization for Spiking Neural Networks" Official Implementation

Language: Python - Size: 438 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 10 - Forks: 2

GongCheng1919/bias-compensation

[CAAI AIR'24] Minimize Quantization Output Error with Bias Compensation

Language: Python - Size: 918 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 7 - Forks: 1

motokimura/pytorch_quantization_fx

An example to quantize MobileNetV2 trained on CIFAR-10 dataset with PyTorch FX graph mode quantization

Language: Python - Size: 86.7 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 6 - Forks: 3

iszry/DI2N-PTQ4DM

Improved the performance of 8-bit PTQ4DM expecially on FID.

Language: Python - Size: 538 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 0

ssi-research/eptq

Implementation of EPTQ - an Enhanced Post-Training Quantization algorithm for DNN compression

Language: Python - Size: 99.6 KB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 0

likholat/openvino_quantization

This sample shows how to convert TensorFlow model to OpenVINO IR model and how to quantize OpenVINO model.

Language: Python - Size: 13.7 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 0

Rumeysakeskin/ASR-Quantization

Post-training quantization on Nvidia Nemo ASR model

Language: Jupyter Notebook - Size: 32.2 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

Gaurav-Van/Fine-Tuning-LLMs

Introductory Guide where we will talk about Different Techniques of Fine Tuning LLMs

Language: Jupyter Notebook - Size: 2.95 MB - Last synced at: 18 days ago - Pushed at: 7 months ago - Stars: 2 - Forks: 2

generalMG/Medical-Dataset-Deep-Learning-Quantization-Data-Analysis

The repository discusses a research work published on MDPI Sensors and provides details about the project.

Language: Python - Size: 112 KB - Last synced at: 9 months ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

TanyaChutani/Quantization_Tensorflow

Quantization for Object Detection in Tensorflow 2.x

Language: Python - Size: 6.84 KB - Last synced at: about 2 months ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

Inpyo-Hong/Model-Compression-Paper-List

Model Compression Paper List (Focusing on Quantization, Particularly Zero-Shot Quantization)

Size: 42 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 1 - Forks: 0

yugwangyeol/PTQ-QAT-Image-Classification

[Project] Edge computing PTQ/QAT Comparison Experiment

Language: Python - Size: 6.84 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

yester31/TensorRT_Examples

All useful sample codes of tensorrt models using onnx

Language: Python - Size: 240 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 1

OmidGhadami95/EfficientNetV2_Quantization_CK

EfficientNetV2 (Efficientnetv2-b2) and quantization int8 and fp32 (QAT and PTQ) on CK+ dataset . fine-tuning, augmentation, solving imbalanced dataset, etc.

Language: Jupyter Notebook - Size: 344 KB - Last synced at: 2 months ago - Pushed at: 12 months ago - Stars: 1 - Forks: 0

satya15july/quantization

Model Quantization with Pytorch, Tensorflow & Larq

Language: C++ - Size: 48.2 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 1

yester31/TensorRT_ONNX

Generating tensorrt model using onnx

Language: C++ - Size: 91.6 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

yutingshih/vit-quant

Quantization for vision transformers

Language: Python - Size: 17.6 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

raj2022/quantization_prunings

Post-Training quantization perfomed on the model trained with CLIC dataset.

Language: Jupyter Notebook - Size: 192 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 1

berlin0308/Raspberrypi-MoViNet-TFLite

Language: Python - Size: 74.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

yhwangs/TQ-DiT

TQ-DiT: Efficient Time-Aware Quantization for Diffusion Transformers

Language: Python - Size: 80.1 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

razamehar/political_leaning_news_detection_backend Fork of iampujan/political_leaning_news_detection_backend

Political Leaning Detection in the News Articles

Language: Jupyter Notebook - Size: 171 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

motokimura/timm_quantization_fx

An example to quantize pretrained models from pytorch-image-models with PyTorch FX graph mode quantization

Language: Python - Size: 570 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

amikom-gace-research-group/gace-ptq-tensorrt

Research experiments archive for post-training quantization with TensorRT. Submitted and Accepted to IEEE EDGE 2024

Language: Python - Size: 14.6 KB - Last synced at: 11 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

smpanaro/norm-tweaking

Post post-training-quantization (PTQ) method for improving LLMs. Unofficial implementation of https://arxiv.org/abs/2309.02784

Language: Python - Size: 32.2 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

yashmaniya0/Quantization-of-Image-Classification-Models

Comprehensive study on the quantization of various CNN models, employing techniques such as Post-Training Quantization and Quantization Aware Training (QAT).

Language: Jupyter Notebook - Size: 80.1 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

yester31/Quantization_EX

quantization example for pqt & qat

Language: Python - Size: 94.7 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Chubbyman2/quantization

ViT quantization implemented from scratch using quantize_fx

Language: Python - Size: 18.7 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

andrea-zanette/HippoScan

A framework to train a ResUNet architecture, quantize, compile and execute it on an FPGA.

Language: Jupyter Notebook - Size: 90.8 KB - Last synced at: 7 months ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

AndreiZoltan/ptq_resnet20

Low-bit (2/4/8/16) Post Training Quantization for ResNet20

Language: Python - Size: 53 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Related Topics
quantization 29 quantization-aware-training 16 pytorch 13 tensorrt 6 pruning 6 model-compression 5 ptq 5 llm 4 large-language-models 4 computer-vision 3 vision-transformer 3 qat 3 model-optimization 3 onnx 3 tensorflow 3 deep-learning 3 tensorflow-lite 2 pytorch-quantization 2 llama2 2 imagenet 2 pytorch-fx-graph-mode-quantization 2 mobilenetv2 2 tensorrt-inference 2 keras 2 awq 2 int8-quantization 2 llama 2 stable-diffusion 2 neuromorphic-computing 2 int8 2 smoothquant 2 diffusion-models 2 sparsity 2 llms 2 tensorflow2 2 ddim 2 python 2 real-time-emotion-classification 1 real-time-emotion-detection 1 sparse 1 imbalanced-dataset 1 scale-down 1 googlecolab 1 facial-emotion-recognition 1 neural-networks 1 onnxruntime 1 emotion-recognition 1 alveo 1 fpga 1 efficientnetv2-b2 1 efficientnetv2 1 efficientnet 1 ckplus 1 pynq 1 raspberry-pi-4 1 object-detection 1 resunet 1 low-rank-adaptation 1 segmentation 1 bert 1 research 1 vitis-ai 1 ultra96v2 1 model-converter 1 efficient-inference 1 localllm 1 natural-language-processing 1 small-models 1 text-generation 1 transformer 1 batch-normalization-fuse 1 bnn 1 convolutional-networks 1 dorefa 1 group-convolution 1 integer-arithmetic-only 1 network-in-network 1 network-slimming 1 tensorrt-int8-python 1 twn 1 xnor-net 1 transformers 1 brain-inspired-computing 1 neuromorphic 1 spiking-neural-networks 1 diffusion-transformer 1 cvpr 1 cvpr2024 1 highlight 1 ldm 1 1-bit-quantization 1 bitnet 1 fine-tuning 1 finetuning-llms 1 gemma 1 lora 1 qlora 1 quantization-algorithms 1 quantization-from-scratch 1 zero-shot-quan 1