An open API service providing repository metadata for many open source software ecosystems.

Topic: "quantization"

hiyouga/LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Language: Python - Size: 44 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 46,849 - Forks: 5,727

ymcui/Chinese-LLaMA-Alpaca

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

Language: Python - Size: 23 MB - Last synced at: 8 days ago - Pushed at: 12 months ago - Stars: 18,795 - Forks: 1,890

SYSTRAN/faster-whisper

Faster Whisper transcription with CTranslate2

Language: Python - Size: 36.6 MB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 15,467 - Forks: 1,299

UFund-Me/Qbot

[🔥updating ...] AI 自动量化交易机器人(完全本地部署) AI-powered Quantitative Investment Research Platform. 📃 online docs: https://ufund-me.github.io/Qbot ✨ :news: qbot-mini: https://github.com/Charmve/iQuant

Language: Jupyter Notebook - Size: 387 MB - Last synced at: 12 days ago - Pushed at: 5 months ago - Stars: 10,984 - Forks: 1,579

bitsandbytes-foundation/bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

Language: Python - Size: 2.73 MB - Last synced at: about 18 hours ago - Pushed at: 2 days ago - Stars: 6,933 - Forks: 686

kornelski/pngquant

Lossy PNG compressor — pngquant command based on libimagequant library

Language: C - Size: 1.71 MB - Last synced at: 11 months ago - Pushed at: 12 months ago - Stars: 5,045 - Forks: 477

AutoGPTQ/AutoGPTQ 📦

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language: Python - Size: 8.01 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 4,804 - Forks: 513

IntelLabs/distiller 📦

Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller

Language: Jupyter Notebook - Size: 40.5 MB - Last synced at: 11 months ago - Pushed at: almost 2 years ago - Stars: 4,310 - Forks: 797

OpenNMT/CTranslate2

Fast inference engine for Transformer models

Language: C++ - Size: 14 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 3,380 - Forks: 300

neuralmagic/deepsparse

Sparsity-aware deep learning inference runtime for CPUs

Language: Python - Size: 137 MB - Last synced at: 7 days ago - Pushed at: 9 months ago - Stars: 3,130 - Forks: 183

huawei-noah/Pretrained-Language-Model

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

Language: Python - Size: 29 MB - Last synced at: 7 days ago - Pushed at: about 1 year ago - Stars: 3,080 - Forks: 635

IntelLabs/nlp-architect 📦

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

Language: Python - Size: 531 MB - Last synced at: 11 months ago - Pushed at: over 2 years ago - Stars: 2,931 - Forks: 443

huggingface/optimum

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

Language: Python - Size: 5.32 MB - Last synced at: about 22 hours ago - Pushed at: 1 day ago - Stars: 2,855 - Forks: 521

aaron-xichen/pytorch-playground

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

Language: Python - Size: 45.9 KB - Last synced at: 6 days ago - Pushed at: over 2 years ago - Stars: 2,658 - Forks: 614

stochasticai/xTuring

Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6

Language: Python - Size: 18.4 MB - Last synced at: 1 day ago - Pushed at: 7 months ago - Stars: 2,642 - Forks: 205

intel/neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Language: Python - Size: 468 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 2,375 - Forks: 267

dvmazur/mixtral-offloading

Run Mixtral-8x7B models in Colab or consumer desktops

Language: Python - Size: 261 KB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 2,303 - Forks: 233

quic/aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

Language: Python - Size: 18.8 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2,274 - Forks: 400

666DZY666/micronet

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape

Language: Python - Size: 6.58 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 2,239 - Forks: 476

Efficient-ML/Awesome-Model-Quantization

A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

Size: 61.5 MB - Last synced at: 8 days ago - Pushed at: about 2 months ago - Stars: 2,053 - Forks: 220

pytorch/ao

PyTorch native quantization and sparsity for training and inference

Language: Python - Size: 29.3 MB - Last synced at: about 11 hours ago - Pushed at: about 12 hours ago - Stars: 1,973 - Forks: 245

intel/intel-extension-for-pytorch

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

Language: Python - Size: 102 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1,826 - Forks: 268

OpenPPL/ppq

PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.

Language: Python - Size: 5.57 MB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 1,677 - Forks: 249

PaddlePaddle/PaddleSlim

PaddleSlim is an open-source library for deep model compression and architecture search.

Language: Python - Size: 16.3 MB - Last synced at: 10 days ago - Pushed at: 5 months ago - Stars: 1,587 - Forks: 350

open-mmlab/mmrazor

OpenMMLab Model Compression Toolbox and Benchmark.

Language: Python - Size: 11.1 MB - Last synced at: 10 days ago - Pushed at: 10 months ago - Stars: 1,576 - Forks: 235

tensorflow/model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

Language: Python - Size: 2.22 MB - Last synced at: 11 days ago - Pushed at: 2 months ago - Stars: 1,530 - Forks: 325

RWKV/rwkv.cpp

INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model

Language: C++ - Size: 42.1 MB - Last synced at: 9 days ago - Pushed at: 28 days ago - Stars: 1,501 - Forks: 103

mit-han-lab/nunchaku

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Language: Cuda - Size: 85.6 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,425 - Forks: 84

thu-ml/SageAttention

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Language: Cuda - Size: 46.1 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1,294 - Forks: 89

Xilinx/brevitas

Brevitas: neural network quantization in PyTorch

Language: Python - Size: 20.1 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 1,292 - Forks: 210

RahulSChand/gpu_poor

Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization

Language: JavaScript - Size: 1.56 MB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 1,284 - Forks: 68

huawei-noah/Efficient-Computing

Efficient computing methods developed by Huawei Noah's Ark Lab

Language: Jupyter Notebook - Size: 100 MB - Last synced at: 9 days ago - Pushed at: 6 months ago - Stars: 1,257 - Forks: 217

vllm-project/llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Language: Python - Size: 22.5 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 1,220 - Forks: 114

open-edge-platform/training_extensions

Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™

Language: Python - Size: 409 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 1,163 - Forks: 450

openvinotoolkit/nncf

Neural Network Compression Framework for enhanced OpenVINO™ inference

Language: Python - Size: 60.9 MB - Last synced at: 8 days ago - Pushed at: 9 days ago - Stars: 997 - Forks: 251

huggingface/optimum-quanto

A pytorch quantization backend for optimum

Language: Python - Size: 2.58 MB - Last synced at: 10 days ago - Pushed at: 13 days ago - Stars: 917 - Forks: 72

mit-han-lab/tinyengine

[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 256KB Memory

Language: C - Size: 235 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 843 - Forks: 137

guan-yuan/awesome-AutoML-and-Lightweight-Models

A list of high-quality (newest) AutoML works and lightweight models including 1.) Neural Architecture Search, 2.) Lightweight Structures, 3.) Model Compression, Quantization and Acceleration, 4.) Hyperparameter Optimization, 5.) Automated Feature Engineering.

Size: 150 KB - Last synced at: 11 months ago - Pushed at: almost 4 years ago - Stars: 827 - Forks: 160

ImageOptim/libimagequant

Palette quantization library that powers pngquant and other PNG optimizers

Language: Rust - Size: 1.34 MB - Last synced at: 6 days ago - Pushed at: 2 months ago - Stars: 822 - Forks: 134

Xilinx/finn

Dataflow compiler for QNN inference on FPGAs

Language: Python - Size: 84.5 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 804 - Forks: 254

mobiusml/hqq

Official implementation of Half-Quadratic Quantization (HQQ)

Language: Python - Size: 521 KB - Last synced at: 1 day ago - Pushed at: 7 days ago - Stars: 786 - Forks: 80

PINTO0309/onnx2tf

Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.

Language: Python - Size: 3.75 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 782 - Forks: 77

csarron/awesome-emdl

Embedded and mobile deep learning research resources

Size: 88.9 KB - Last synced at: 8 days ago - Pushed at: about 2 years ago - Stars: 746 - Forks: 167

mit-han-lab/TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library

Language: C++ - Size: 83.3 MB - Last synced at: 5 months ago - Pushed at: 10 months ago - Stars: 745 - Forks: 72

SqueezeAILab/SqueezeLLM

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

Language: Python - Size: 1.5 MB - Last synced at: 7 days ago - Pushed at: 8 months ago - Stars: 685 - Forks: 45

DeepVAC/deepvac

PyTorch Project Specification.

Language: Python - Size: 791 KB - Last synced at: 12 days ago - Pushed at: over 3 years ago - Stars: 679 - Forks: 105

IST-DASLab/marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Language: Python - Size: 708 KB - Last synced at: 4 months ago - Pushed at: 8 months ago - Stars: 661 - Forks: 52

OpenGVLab/OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Language: Python - Size: 8.16 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 629 - Forks: 49

SforAiDl/KD_Lib

A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.

Language: Python - Size: 22.2 MB - Last synced at: 8 days ago - Pushed at: about 2 years ago - Stars: 622 - Forks: 59

Ki6an/fastT5

⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.

Language: Python - Size: 277 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 578 - Forks: 73

Maknee/minigpt4.cpp

Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)

Language: C++ - Size: 2.12 MB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 565 - Forks: 27

google/qkeras

QKeras: a quantization deep learning library for Tensorflow Keras

Language: Python - Size: 1.53 MB - Last synced at: about 18 hours ago - Pushed at: 16 days ago - Stars: 562 - Forks: 105

thulab/DeepHash

An Open-Source Package for Deep Learning to Hash (DeepHash)

Language: Python - Size: 7.71 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 541 - Forks: 126

cedrickchee/awesome-ml-model-compression

Awesome machine learning model compression research papers, quantization, tools, and learning material.

Size: 213 KB - Last synced at: 1 day ago - Pushed at: 7 months ago - Stars: 510 - Forks: 61

MPolaris/onnx2tflite

Tool for onnx->keras or onnx->tflite. Hope this tool can help you.

Language: Python - Size: 175 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 503 - Forks: 39

Zhen-Dong/Awesome-Quantization-Papers

List of papers related to neural network quantization in recent AI conferences and journals.

Size: 309 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 478 - Forks: 39

DerryHub/BEVFormer_tensorrt

BEVFormer inference on TensorRT, including INT8 Quantization and Custom TensorRT Plugins (float/half/half2/int8).

Language: Python - Size: 403 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 466 - Forks: 76

ModelCloud/GPTQModel

Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

Language: Python - Size: 11.8 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 465 - Forks: 68

huggingface/optimum-intel

🤗 Optimum Intel: Accelerate inference with Intel optimization tools

Language: Jupyter Notebook - Size: 17 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 459 - Forks: 128

ModelTC/llmc

[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

Language: Python - Size: 28.9 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 456 - Forks: 53

thu-ml/SpargeAttn

SpargeAttention: A training-free sparse attention that can accelerate any model inference.

Language: Cuda - Size: 55.4 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 435 - Forks: 27

intel/auto-round

Advanced Quantization Algorithm for LLMs/VLMs.

Language: Python - Size: 10.4 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 431 - Forks: 33

hailo-ai/hailo_model_zoo

The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment

Language: Python - Size: 5.78 MB - Last synced at: 8 days ago - Pushed at: 10 days ago - Stars: 425 - Forks: 54

1duo/awesome-ai-infrastructures

Infrastructures™ for Machine Learning Training/Inference in Production.

Size: 11.8 MB - Last synced at: 8 days ago - Pushed at: almost 6 years ago - Stars: 411 - Forks: 73

sony/model_optimization

Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. This project provides researchers, developers, and engineers advanced quantization and compression tools for deploying state-of-the-art neural networks.

Language: Python - Size: 22 MB - Last synced at: about 4 hours ago - Pushed at: about 5 hours ago - Stars: 386 - Forks: 66

mit-han-lab/haq

[CVPR 2019, Oral] HAQ: Hardware-Aware Automated Quantization with Mixed Precision

Language: Python - Size: 64.5 KB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 380 - Forks: 85

tpoisonooo/llama.onnx

LLaMa/RWKV onnx models, quantization and testcase

Language: Python - Size: 1.3 MB - Last synced at: 1 day ago - Pushed at: almost 2 years ago - Stars: 361 - Forks: 31

Zhen-Dong/HAWQ

Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.

Language: Python - Size: 691 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 361 - Forks: 80

neuralmagic/sparsezoo

Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes

Language: Python - Size: 1.79 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 359 - Forks: 23

Xiuyu-Li/q-diffusion

[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.

Language: Python - Size: 5.97 MB - Last synced at: 14 days ago - Pushed at: about 1 year ago - Stars: 347 - Forks: 24

inisis/brocolli

Everything in Torch Fx

Language: Python - Size: 5.9 MB - Last synced at: 7 days ago - Pushed at: 11 months ago - Stars: 341 - Forks: 61

SqueezeAILab/KVQuant

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Language: Python - Size: 19.8 MB - Last synced at: 1 day ago - Pushed at: 8 months ago - Stars: 339 - Forks: 30

megvii-research/FQ-ViT

[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer

Language: Python - Size: 729 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 329 - Forks: 48

megvii-research/Sparsebit

A model compression and acceleration toolbox based on pytorch.

Language: Python - Size: 7.45 MB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 327 - Forks: 40

neuralmagic/sparsify

ML model optimization product to accelerate inference.

Language: Python - Size: 7.18 MB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 326 - Forks: 30

caoyue10/DeepHash-Papers

Must-read papers on deep learning to hash (DeepHash)

Size: 16.6 KB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 319 - Forks: 78

jy-yuan/KIVI

[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Language: Python - Size: 16.7 MB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 288 - Forks: 29

Beomi/BitNet-Transformers

0️⃣1️⃣🤗 BitNet-Transformers: Huggingface Transformers Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch with Llama(2) Architecture

Language: Python - Size: 588 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 288 - Forks: 34

datawhalechina/awesome-compression

模型压缩的小白入门教程

Size: 302 MB - Last synced at: 9 days ago - Pushed at: 5 months ago - Stars: 265 - Forks: 35

sinanuozdemir/quick-start-guide-to-llms

The Official Repo for "Quick Start Guide to Large Language Models"

Language: Jupyter Notebook - Size: 90.6 MB - Last synced at: 8 days ago - Pushed at: 13 days ago - Stars: 264 - Forks: 145

Bisonai/awesome-edge-machine-learning

A curated list of awesome edge machine learning resources, including research papers, inference engines, challenges, books, meetups and others.

Language: Python - Size: 135 KB - Last synced at: 9 days ago - Pushed at: about 2 years ago - Stars: 260 - Forks: 51

amirgholami/ZeroQ

[CVPR'20] ZeroQ: A Novel Zero Shot Quantization Framework

Language: Python - Size: 5.47 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 258 - Forks: 52

THU-MIG/torch-model-compression

针对pytorch模型的自动化模型结构分析和修改工具集,包含自动分析模型结构的模型压缩算法库

Language: Python - Size: 132 KB - Last synced at: 7 days ago - Pushed at: about 2 years ago - Stars: 250 - Forks: 41

blue-oil/blueoil 📦

Bring Deep Learning to small devices

Language: Python - Size: 291 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 249 - Forks: 85

microsoft/LQ-Nets 📦

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

Language: Python - Size: 28.3 KB - Last synced at: about 6 hours ago - Pushed at: over 2 years ago - Stars: 242 - Forks: 69

kssteven418/I-BERT

[ICML'21 Oral] I-BERT: Integer-only BERT Quantization

Language: Python - Size: 6.38 MB - Last synced at: 14 days ago - Pushed at: about 2 years ago - Stars: 241 - Forks: 34

datawhalechina/llm-deploy

大模型/LLM推理和部署理论与实践

Size: 100 MB - Last synced at: 2 days ago - Pushed at: about 1 month ago - Stars: 240 - Forks: 35

jakc4103/DFQ

PyTorch implementation of Data Free Quantization Through Weight Equalization and Bias Correction.

Language: Python - Size: 140 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 239 - Forks: 42

Picovoice/picollm

On-device LLM Inference Powered by X-Bit Quantization

Language: Python - Size: 94.2 MB - Last synced at: 1 day ago - Pushed at: 10 days ago - Stars: 233 - Forks: 13

j-marple-dev/model_compression

PyTorch Model Compression

Language: Python - Size: 31 MB - Last synced at: 5 months ago - Pushed at: about 2 years ago - Stars: 230 - Forks: 25

aredden/flux-fp8-api

Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.

Language: Python - Size: 157 KB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 227 - Forks: 28

zcemycl/TF2DeepFloorplan

TF2 Deep FloorPlan Recognition using a Multi-task Network with Room-boundary-Guided Attention. Enable tensorboard, quantization, flask, tflite, docker, github actions and google colab.

Language: Python - Size: 7.93 MB - Last synced at: 11 days ago - Pushed at: almost 2 years ago - Stars: 222 - Forks: 72

submission2019/cnn-quantization

Quantization of Convolutional Neural networks.

Language: Python - Size: 2.71 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 216 - Forks: 58

ikergarcia1996/Easy-Translate

Easy-Translate is a script for translating large text files with a SINGLE COMMAND. Easy-Translate is designed to be as easy as possible for beginners and as seamlesscustomizable and as possible for advanced users.

Language: Python - Size: 656 KB - Last synced at: 13 days ago - Pushed at: 5 months ago - Stars: 212 - Forks: 326

Aaronhuang-778/BiLLM

[ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Language: Python - Size: 1.73 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 207 - Forks: 14

natasha/navec

Compact high quality word embeddings for Russian language

Language: Python - Size: 1.86 MB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 198 - Forks: 18

FasterDecoding/BitDelta

Language: Jupyter Notebook - Size: 7.21 MB - Last synced at: 3 days ago - Pushed at: 5 months ago - Stars: 197 - Forks: 15

dbohdan/hicolor

🎨 Convert images to 15/16-bit RGB color with dithering

Language: C - Size: 639 KB - Last synced at: 6 months ago - Pushed at: 7 months ago - Stars: 195 - Forks: 5

VITA-Group/Q-GaLore

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.

Language: Python - Size: 343 KB - Last synced at: 8 days ago - Pushed at: 9 months ago - Stars: 195 - Forks: 16

wenwei202/terngrad

Ternary Gradients to Reduce Communication in Distributed Deep Learning (TensorFlow)

Language: Python - Size: 5.59 MB - Last synced at: 12 months ago - Pushed at: over 6 years ago - Stars: 181 - Forks: 48