Topic: "quantization"
hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Language: Python - Size: 44 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 46,849 - Forks: 5,727

ymcui/Chinese-LLaMA-Alpaca
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
Language: Python - Size: 23 MB - Last synced at: 8 days ago - Pushed at: 12 months ago - Stars: 18,795 - Forks: 1,890

SYSTRAN/faster-whisper
Faster Whisper transcription with CTranslate2
Language: Python - Size: 36.6 MB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 15,467 - Forks: 1,299

UFund-Me/Qbot
[🔥updating ...] AI 自动量化交易机器人(完全本地部署) AI-powered Quantitative Investment Research Platform. 📃 online docs: https://ufund-me.github.io/Qbot ✨ :news: qbot-mini: https://github.com/Charmve/iQuant
Language: Jupyter Notebook - Size: 387 MB - Last synced at: 12 days ago - Pushed at: 5 months ago - Stars: 10,984 - Forks: 1,579

bitsandbytes-foundation/bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
Language: Python - Size: 2.73 MB - Last synced at: about 18 hours ago - Pushed at: 2 days ago - Stars: 6,933 - Forks: 686

kornelski/pngquant
Lossy PNG compressor — pngquant command based on libimagequant library
Language: C - Size: 1.71 MB - Last synced at: 11 months ago - Pushed at: 12 months ago - Stars: 5,045 - Forks: 477

AutoGPTQ/AutoGPTQ 📦
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Language: Python - Size: 8.01 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 4,804 - Forks: 513

IntelLabs/distiller 📦
Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller
Language: Jupyter Notebook - Size: 40.5 MB - Last synced at: 11 months ago - Pushed at: almost 2 years ago - Stars: 4,310 - Forks: 797

OpenNMT/CTranslate2
Fast inference engine for Transformer models
Language: C++ - Size: 14 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 3,380 - Forks: 300

neuralmagic/deepsparse
Sparsity-aware deep learning inference runtime for CPUs
Language: Python - Size: 137 MB - Last synced at: 7 days ago - Pushed at: 9 months ago - Stars: 3,130 - Forks: 183

huawei-noah/Pretrained-Language-Model
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
Language: Python - Size: 29 MB - Last synced at: 7 days ago - Pushed at: about 1 year ago - Stars: 3,080 - Forks: 635

IntelLabs/nlp-architect 📦
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks
Language: Python - Size: 531 MB - Last synced at: 11 months ago - Pushed at: over 2 years ago - Stars: 2,931 - Forks: 443

huggingface/optimum
🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
Language: Python - Size: 5.32 MB - Last synced at: about 22 hours ago - Pushed at: 1 day ago - Stars: 2,855 - Forks: 521

aaron-xichen/pytorch-playground
Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)
Language: Python - Size: 45.9 KB - Last synced at: 6 days ago - Pushed at: over 2 years ago - Stars: 2,658 - Forks: 614

stochasticai/xTuring
Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6
Language: Python - Size: 18.4 MB - Last synced at: 1 day ago - Pushed at: 7 months ago - Stars: 2,642 - Forks: 205

intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Language: Python - Size: 468 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 2,375 - Forks: 267

dvmazur/mixtral-offloading
Run Mixtral-8x7B models in Colab or consumer desktops
Language: Python - Size: 261 KB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 2,303 - Forks: 233

quic/aimet
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
Language: Python - Size: 18.8 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2,274 - Forks: 400

666DZY666/micronet
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape
Language: Python - Size: 6.58 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 2,239 - Forks: 476

Efficient-ML/Awesome-Model-Quantization
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
Size: 61.5 MB - Last synced at: 8 days ago - Pushed at: about 2 months ago - Stars: 2,053 - Forks: 220

pytorch/ao
PyTorch native quantization and sparsity for training and inference
Language: Python - Size: 29.3 MB - Last synced at: about 11 hours ago - Pushed at: about 12 hours ago - Stars: 1,973 - Forks: 245

intel/intel-extension-for-pytorch
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
Language: Python - Size: 102 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1,826 - Forks: 268

OpenPPL/ppq
PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
Language: Python - Size: 5.57 MB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 1,677 - Forks: 249

PaddlePaddle/PaddleSlim
PaddleSlim is an open-source library for deep model compression and architecture search.
Language: Python - Size: 16.3 MB - Last synced at: 10 days ago - Pushed at: 5 months ago - Stars: 1,587 - Forks: 350

open-mmlab/mmrazor
OpenMMLab Model Compression Toolbox and Benchmark.
Language: Python - Size: 11.1 MB - Last synced at: 10 days ago - Pushed at: 10 months ago - Stars: 1,576 - Forks: 235

tensorflow/model-optimization
A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
Language: Python - Size: 2.22 MB - Last synced at: 11 days ago - Pushed at: 2 months ago - Stars: 1,530 - Forks: 325

RWKV/rwkv.cpp
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Language: C++ - Size: 42.1 MB - Last synced at: 9 days ago - Pushed at: 28 days ago - Stars: 1,501 - Forks: 103

mit-han-lab/nunchaku
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Language: Cuda - Size: 85.6 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,425 - Forks: 84

thu-ml/SageAttention
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
Language: Cuda - Size: 46.1 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1,294 - Forks: 89

Xilinx/brevitas
Brevitas: neural network quantization in PyTorch
Language: Python - Size: 20.1 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 1,292 - Forks: 210

RahulSChand/gpu_poor
Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization
Language: JavaScript - Size: 1.56 MB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 1,284 - Forks: 68

huawei-noah/Efficient-Computing
Efficient computing methods developed by Huawei Noah's Ark Lab
Language: Jupyter Notebook - Size: 100 MB - Last synced at: 9 days ago - Pushed at: 6 months ago - Stars: 1,257 - Forks: 217

vllm-project/llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Language: Python - Size: 22.5 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 1,220 - Forks: 114

open-edge-platform/training_extensions
Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™
Language: Python - Size: 409 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 1,163 - Forks: 450

openvinotoolkit/nncf
Neural Network Compression Framework for enhanced OpenVINO™ inference
Language: Python - Size: 60.9 MB - Last synced at: 8 days ago - Pushed at: 9 days ago - Stars: 997 - Forks: 251

huggingface/optimum-quanto
A pytorch quantization backend for optimum
Language: Python - Size: 2.58 MB - Last synced at: 10 days ago - Pushed at: 13 days ago - Stars: 917 - Forks: 72

mit-han-lab/tinyengine
[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 256KB Memory
Language: C - Size: 235 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 843 - Forks: 137

guan-yuan/awesome-AutoML-and-Lightweight-Models
A list of high-quality (newest) AutoML works and lightweight models including 1.) Neural Architecture Search, 2.) Lightweight Structures, 3.) Model Compression, Quantization and Acceleration, 4.) Hyperparameter Optimization, 5.) Automated Feature Engineering.
Size: 150 KB - Last synced at: 11 months ago - Pushed at: almost 4 years ago - Stars: 827 - Forks: 160

ImageOptim/libimagequant
Palette quantization library that powers pngquant and other PNG optimizers
Language: Rust - Size: 1.34 MB - Last synced at: 6 days ago - Pushed at: 2 months ago - Stars: 822 - Forks: 134

Xilinx/finn
Dataflow compiler for QNN inference on FPGAs
Language: Python - Size: 84.5 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 804 - Forks: 254

mobiusml/hqq
Official implementation of Half-Quadratic Quantization (HQQ)
Language: Python - Size: 521 KB - Last synced at: 1 day ago - Pushed at: 7 days ago - Stars: 786 - Forks: 80

PINTO0309/onnx2tf
Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.
Language: Python - Size: 3.75 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 782 - Forks: 77

csarron/awesome-emdl
Embedded and mobile deep learning research resources
Size: 88.9 KB - Last synced at: 8 days ago - Pushed at: about 2 years ago - Stars: 746 - Forks: 167

mit-han-lab/TinyChatEngine
TinyChatEngine: On-Device LLM Inference Library
Language: C++ - Size: 83.3 MB - Last synced at: 5 months ago - Pushed at: 10 months ago - Stars: 745 - Forks: 72

SqueezeAILab/SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
Language: Python - Size: 1.5 MB - Last synced at: 7 days ago - Pushed at: 8 months ago - Stars: 685 - Forks: 45

DeepVAC/deepvac
PyTorch Project Specification.
Language: Python - Size: 791 KB - Last synced at: 12 days ago - Pushed at: over 3 years ago - Stars: 679 - Forks: 105

IST-DASLab/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Language: Python - Size: 708 KB - Last synced at: 4 months ago - Pushed at: 8 months ago - Stars: 661 - Forks: 52

OpenGVLab/OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
Language: Python - Size: 8.16 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 629 - Forks: 49

SforAiDl/KD_Lib
A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.
Language: Python - Size: 22.2 MB - Last synced at: 8 days ago - Pushed at: about 2 years ago - Stars: 622 - Forks: 59

Ki6an/fastT5
⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.
Language: Python - Size: 277 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 578 - Forks: 73

Maknee/minigpt4.cpp
Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)
Language: C++ - Size: 2.12 MB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 565 - Forks: 27

google/qkeras
QKeras: a quantization deep learning library for Tensorflow Keras
Language: Python - Size: 1.53 MB - Last synced at: about 18 hours ago - Pushed at: 16 days ago - Stars: 562 - Forks: 105

thulab/DeepHash
An Open-Source Package for Deep Learning to Hash (DeepHash)
Language: Python - Size: 7.71 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 541 - Forks: 126

cedrickchee/awesome-ml-model-compression
Awesome machine learning model compression research papers, quantization, tools, and learning material.
Size: 213 KB - Last synced at: 1 day ago - Pushed at: 7 months ago - Stars: 510 - Forks: 61

MPolaris/onnx2tflite
Tool for onnx->keras or onnx->tflite. Hope this tool can help you.
Language: Python - Size: 175 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 503 - Forks: 39

Zhen-Dong/Awesome-Quantization-Papers
List of papers related to neural network quantization in recent AI conferences and journals.
Size: 309 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 478 - Forks: 39

DerryHub/BEVFormer_tensorrt
BEVFormer inference on TensorRT, including INT8 Quantization and Custom TensorRT Plugins (float/half/half2/int8).
Language: Python - Size: 403 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 466 - Forks: 76

ModelCloud/GPTQModel
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Language: Python - Size: 11.8 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 465 - Forks: 68

huggingface/optimum-intel
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
Language: Jupyter Notebook - Size: 17 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 459 - Forks: 128

ModelTC/llmc
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
Language: Python - Size: 28.9 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 456 - Forks: 53

thu-ml/SpargeAttn
SpargeAttention: A training-free sparse attention that can accelerate any model inference.
Language: Cuda - Size: 55.4 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 435 - Forks: 27

intel/auto-round
Advanced Quantization Algorithm for LLMs/VLMs.
Language: Python - Size: 10.4 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 431 - Forks: 33

hailo-ai/hailo_model_zoo
The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment
Language: Python - Size: 5.78 MB - Last synced at: 8 days ago - Pushed at: 10 days ago - Stars: 425 - Forks: 54

1duo/awesome-ai-infrastructures
Infrastructures™ for Machine Learning Training/Inference in Production.
Size: 11.8 MB - Last synced at: 8 days ago - Pushed at: almost 6 years ago - Stars: 411 - Forks: 73

sony/model_optimization
Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. This project provides researchers, developers, and engineers advanced quantization and compression tools for deploying state-of-the-art neural networks.
Language: Python - Size: 22 MB - Last synced at: about 4 hours ago - Pushed at: about 5 hours ago - Stars: 386 - Forks: 66

mit-han-lab/haq
[CVPR 2019, Oral] HAQ: Hardware-Aware Automated Quantization with Mixed Precision
Language: Python - Size: 64.5 KB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 380 - Forks: 85

tpoisonooo/llama.onnx
LLaMa/RWKV onnx models, quantization and testcase
Language: Python - Size: 1.3 MB - Last synced at: 1 day ago - Pushed at: almost 2 years ago - Stars: 361 - Forks: 31

Zhen-Dong/HAWQ
Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.
Language: Python - Size: 691 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 361 - Forks: 80

neuralmagic/sparsezoo
Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes
Language: Python - Size: 1.79 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 359 - Forks: 23

Xiuyu-Li/q-diffusion
[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.
Language: Python - Size: 5.97 MB - Last synced at: 14 days ago - Pushed at: about 1 year ago - Stars: 347 - Forks: 24

inisis/brocolli
Everything in Torch Fx
Language: Python - Size: 5.9 MB - Last synced at: 7 days ago - Pushed at: 11 months ago - Stars: 341 - Forks: 61

SqueezeAILab/KVQuant
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Language: Python - Size: 19.8 MB - Last synced at: 1 day ago - Pushed at: 8 months ago - Stars: 339 - Forks: 30

megvii-research/FQ-ViT
[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer
Language: Python - Size: 729 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 329 - Forks: 48

megvii-research/Sparsebit
A model compression and acceleration toolbox based on pytorch.
Language: Python - Size: 7.45 MB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 327 - Forks: 40

neuralmagic/sparsify
ML model optimization product to accelerate inference.
Language: Python - Size: 7.18 MB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 326 - Forks: 30

caoyue10/DeepHash-Papers
Must-read papers on deep learning to hash (DeepHash)
Size: 16.6 KB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 319 - Forks: 78

jy-yuan/KIVI
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
Language: Python - Size: 16.7 MB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 288 - Forks: 29

Beomi/BitNet-Transformers
0️⃣1️⃣🤗 BitNet-Transformers: Huggingface Transformers Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch with Llama(2) Architecture
Language: Python - Size: 588 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 288 - Forks: 34

datawhalechina/awesome-compression
模型压缩的小白入门教程
Size: 302 MB - Last synced at: 9 days ago - Pushed at: 5 months ago - Stars: 265 - Forks: 35

sinanuozdemir/quick-start-guide-to-llms
The Official Repo for "Quick Start Guide to Large Language Models"
Language: Jupyter Notebook - Size: 90.6 MB - Last synced at: 8 days ago - Pushed at: 13 days ago - Stars: 264 - Forks: 145

Bisonai/awesome-edge-machine-learning
A curated list of awesome edge machine learning resources, including research papers, inference engines, challenges, books, meetups and others.
Language: Python - Size: 135 KB - Last synced at: 9 days ago - Pushed at: about 2 years ago - Stars: 260 - Forks: 51

amirgholami/ZeroQ
[CVPR'20] ZeroQ: A Novel Zero Shot Quantization Framework
Language: Python - Size: 5.47 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 258 - Forks: 52

THU-MIG/torch-model-compression
针对pytorch模型的自动化模型结构分析和修改工具集,包含自动分析模型结构的模型压缩算法库
Language: Python - Size: 132 KB - Last synced at: 7 days ago - Pushed at: about 2 years ago - Stars: 250 - Forks: 41

blue-oil/blueoil 📦
Bring Deep Learning to small devices
Language: Python - Size: 291 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 249 - Forks: 85

microsoft/LQ-Nets 📦
LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
Language: Python - Size: 28.3 KB - Last synced at: about 6 hours ago - Pushed at: over 2 years ago - Stars: 242 - Forks: 69

kssteven418/I-BERT
[ICML'21 Oral] I-BERT: Integer-only BERT Quantization
Language: Python - Size: 6.38 MB - Last synced at: 14 days ago - Pushed at: about 2 years ago - Stars: 241 - Forks: 34

datawhalechina/llm-deploy
大模型/LLM推理和部署理论与实践
Size: 100 MB - Last synced at: 2 days ago - Pushed at: about 1 month ago - Stars: 240 - Forks: 35

jakc4103/DFQ
PyTorch implementation of Data Free Quantization Through Weight Equalization and Bias Correction.
Language: Python - Size: 140 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 239 - Forks: 42

Picovoice/picollm
On-device LLM Inference Powered by X-Bit Quantization
Language: Python - Size: 94.2 MB - Last synced at: 1 day ago - Pushed at: 10 days ago - Stars: 233 - Forks: 13

j-marple-dev/model_compression
PyTorch Model Compression
Language: Python - Size: 31 MB - Last synced at: 5 months ago - Pushed at: about 2 years ago - Stars: 230 - Forks: 25

aredden/flux-fp8-api
Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.
Language: Python - Size: 157 KB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 227 - Forks: 28

zcemycl/TF2DeepFloorplan
TF2 Deep FloorPlan Recognition using a Multi-task Network with Room-boundary-Guided Attention. Enable tensorboard, quantization, flask, tflite, docker, github actions and google colab.
Language: Python - Size: 7.93 MB - Last synced at: 11 days ago - Pushed at: almost 2 years ago - Stars: 222 - Forks: 72

submission2019/cnn-quantization
Quantization of Convolutional Neural networks.
Language: Python - Size: 2.71 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 216 - Forks: 58

ikergarcia1996/Easy-Translate
Easy-Translate is a script for translating large text files with a SINGLE COMMAND. Easy-Translate is designed to be as easy as possible for beginners and as seamlesscustomizable and as possible for advanced users.
Language: Python - Size: 656 KB - Last synced at: 13 days ago - Pushed at: 5 months ago - Stars: 212 - Forks: 326

Aaronhuang-778/BiLLM
[ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Language: Python - Size: 1.73 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 207 - Forks: 14

natasha/navec
Compact high quality word embeddings for Russian language
Language: Python - Size: 1.86 MB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 198 - Forks: 18

FasterDecoding/BitDelta
Language: Jupyter Notebook - Size: 7.21 MB - Last synced at: 3 days ago - Pushed at: 5 months ago - Stars: 197 - Forks: 15

dbohdan/hicolor
🎨 Convert images to 15/16-bit RGB color with dithering
Language: C - Size: 639 KB - Last synced at: 6 months ago - Pushed at: 7 months ago - Stars: 195 - Forks: 5

VITA-Group/Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
Language: Python - Size: 343 KB - Last synced at: 8 days ago - Pushed at: 9 months ago - Stars: 195 - Forks: 16

wenwei202/terngrad
Ternary Gradients to Reduce Communication in Distributed Deep Learning (TensorFlow)
Language: Python - Size: 5.59 MB - Last synced at: 12 months ago - Pushed at: over 6 years ago - Stars: 181 - Forks: 48
