GitHub topics: quantization

open-mmlab/mmrazor

OpenMMLab Model Compression Toolbox and Benchmark.

Language: Python - Size: 11.1 MB - Last synced at: 26 days ago - Pushed at: about 1 year ago - Stars: 1,599 - Forks: 236

raywan-110/AdaQP

Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training

Language: Python - Size: 97.7 KB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 24 - Forks: 3

vbdi/casp

[CVPR 2025] CASP: Compression of Large Multimodal Models Based on Attention Sparsity

Language: Python - Size: 764 KB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 4 - Forks: 1

neuralmagic/sparsezoo

Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes

Language: Python - Size: 1.33 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 386 - Forks: 28

lucadellalib/audiocodecs

A collections of audio codecs with a standardized API

Language: Python - Size: 851 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 20 - Forks: 3

aaron-xichen/pytorch-playground

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

Language: Python - Size: 45.9 KB - Last synced at: 25 days ago - Pushed at: over 2 years ago - Stars: 2,669 - Forks: 618

d1pankarmedhi/nn-linear-quantization

linear quantization with W8A16 for neural networks with PyTorch

Language: Python - Size: 28.3 KB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

Victorletzelter/annealed_mcl

Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealing (NeurIPS 2024)

Language: Python - Size: 39.3 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 6 - Forks: 0

ModelTC/QLLM

[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models"

Language: Python - Size: 1.68 MB - Last synced at: 10 days ago - Pushed at: over 1 year ago - Stars: 38 - Forks: 4

grinn-global/bionic-robot-hand-demo

A mixed‐precision YOLO-Pose hand gesture recognition system based on the Synaptics Astra SL1680 NPU

Language: Python - Size: 804 KB - Last synced at: 3 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

koszeggy/KGySoft.Drawing.Tools

Debugger visualizers and image editor apps built on KGy SOFT Drawing Libraries

Language: C# - Size: 2.72 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 22 - Forks: 4

hailo-ai/hailo_model_zoo

The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment

Language: Python - Size: 5.83 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 451 - Forks: 58

cedrickchee/awesome-ml-model-compression

Awesome machine learning model compression research papers, quantization, tools, and learning material.

Size: 213 KB - Last synced at: 29 days ago - Pushed at: 9 months ago - Stars: 523 - Forks: 60

AI Engineering is a comprehensive bootcamp designed for programmers to master AI through practical projects and foundational theory. Each week, participants engage in hands-on learning, covering essential topics like Python, data manipulation, and machine learning math. 🐙💻

Language: Python - Size: 24.4 KB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 0 - Forks: 0

ikergarcia1996/Easy-Translate

Easy-Translate is a script for translating large text files with a SINGLE COMMAND. Easy-Translate is designed to be as easy as possible for beginners and as seamlesscustomizable and as possible for advanced users.

Language: Python - Size: 656 KB - Last synced at: 23 days ago - Pushed at: 7 months ago - Stars: 217 - Forks: 338

saifhaq/alma

Language: Python - Size: 21.7 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 20 - Forks: 1

joisino/speedbook

書籍『深層ニューラルネットワークの高速化』のサポートサイトです。

Language: Jupyter Notebook - Size: 480 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 57 - Forks: 2

sinanuozdemir/quick-start-guide-to-llms

The Official Repo for "Quick Start Guide to Large Language Models"

Language: Jupyter Notebook - Size: 91.8 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 282 - Forks: 165

guan-yuan/Awesome-AutoML-and-Lightweight-Models

A list of high-quality (newest) AutoML works and lightweight models including 1.) Neural Architecture Search, 2.) Lightweight Structures, 3.) Model Compression, Quantization and Acceleration, 4.) Hyperparameter Optimization, 5.) Automated Feature Engineering.

Size: 150 KB - Last synced at: 2 days ago - Pushed at: about 4 years ago - Stars: 854 - Forks: 160

IntelLabs/nlp-architect 📦

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

Language: Python - Size: 531 MB - Last synced at: 29 days ago - Pushed at: over 2 years ago - Stars: 2,940 - Forks: 448

neuralmagic/deepsparse

Sparsity-aware deep learning inference runtime for CPUs

Language: Python - Size: 137 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 3,147 - Forks: 186

slitiWassim/Drone-Guard

A Self-Supervised Deep Learning Framework for Spatiotemporal Anomaly Detection in UAV Surveillance Videos

Language: JavaScript - Size: 56.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

SforAiDl/KD_Lib

A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.

Language: Python - Size: 22.2 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 630 - Forks: 60

multi-modal-ai/production-hub

Hands-on hub to learn techniques to optimize and serve AI models to production the most optimal way.

Language: Jupyter Notebook - Size: 43.2 MB - Last synced at: 12 days ago - Pushed at: 9 months ago - Stars: 8 - Forks: 1

natasha/navec

Compact high quality word embeddings for Russian language

Language: Python - Size: 1.86 MB - Last synced at: 29 days ago - Pushed at: almost 2 years ago - Stars: 200 - Forks: 18

VThuong99/LeNet5qt.c

Language: C - Size: 3.7 MB - Last synced at: 10 days ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

mit-han-lab/tinyengine

[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 256KB Memory

Language: C - Size: 235 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 869 - Forks: 141

bitsandbytes-foundation/bitsandbytes-intel 📦

An extension to enable performance acceleration for bitsandbytes on Intel platforms.

Language: Python - Size: 35.2 KB - Last synced at: 1 day ago - Pushed at: 2 months ago - Stars: 3 - Forks: 1

huawei-noah/Efficient-Computing

Efficient computing methods developed by Huawei Noah's Ark Lab

Language: Jupyter Notebook - Size: 100 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 1,273 - Forks: 218

OpenNMT/CTranslate2

Fast inference engine for Transformer models

Language: C++ - Size: 14.5 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 3,810 - Forks: 358

winstxnhdw/ct2hf

A friendly CLI tool for converting and uploading transformers for CTranslate2.

Language: Python - Size: 13.6 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

kornelski/pngquant

Lossy PNG compressor — pngquant command based on libimagequant library

Language: C - Size: 1.71 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 5,381 - Forks: 492

arasgungore/PCM-and-DM-modulators

A Python/MATLAB project which implements pulse-code modulation (PCM) and delta modulation (DM).

Language: Jupyter Notebook - Size: 742 KB - Last synced at: 12 days ago - Pushed at: almost 3 years ago - Stars: 13 - Forks: 0

zlatko-minev/pyEPR

Powerful, automated analysis and design of quantum microwave chips & devices [Energy-Participation Ratio and more]

Language: Python - Size: 2.78 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 179 - Forks: 253

UFund-Me/Qbot

[🔥updating ...] AI 自动量化交易机器人(完全本地部署) AI-powered Quantitative Investment Research Platform. 📃 online docs: https://ufund-me.github.io/Qbot ✨ :news: qbot-mini: https://github.com/Charmve/iQuant

Language: Jupyter Notebook - Size: 387 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 11,442 - Forks: 1,647

DeepVAC/deepvac

PyTorch Project Specification.

Language: Python - Size: 791 KB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 680 - Forks: 104

GreenBull31/tinyllama-coreml-ios18-quantization

Quantize TinyLlama-1.1B-Chat from PyTorch to CoreML (float16, int8, int4) for efficient on-device inference on iOS 18+.

Language: Python - Size: 6.84 KB - Last synced at: 16 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

AvatariaProducciones/KVSplit

Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality loss. Includes benchmarking, visualization, and one-command setup. Optimized for M1/M2/M3 Macs with Metal support.

Language: Python - Size: 722 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

microsoft/LQ-Nets 📦

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

Language: Python - Size: 28.3 KB - Last synced at: 3 days ago - Pushed at: almost 3 years ago - Stars: 242 - Forks: 69

dvmazur/mixtral-offloading

Run Mixtral-8x7B models in Colab or consumer desktops

Language: Python - Size: 261 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 2,311 - Forks: 232

Beomi/BitNet-Transformers

0️⃣1️⃣🤗 BitNet-Transformers: Huggingface Transformers Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch with Llama(2) Architecture

Language: Python - Size: 588 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 302 - Forks: 32

Xiuyu-Li/q-diffusion

[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.

Language: Python - Size: 5.97 MB - Last synced at: 28 days ago - Pushed at: over 1 year ago - Stars: 347 - Forks: 24

neuralmagic/sparsify

ML model optimization product to accelerate inference.

Language: Python - Size: 6.99 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 324 - Forks: 30

submission2019/cnn-quantization

Quantization of Convolutional Neural networks.

Language: Python - Size: 2.71 MB - Last synced at: 25 days ago - Pushed at: 11 months ago - Stars: 243 - Forks: 60

Artessay/ArtQuantization

ArtQuantization is developed for quantizing Large Language Models, focusing on optimizing the memory usage and performance. This repository provides experimental results of quantizing models such as Qwen2.5 using different algorithms like AWQ and GPTQ, and demonstrates the memory requirements under various graphics card configurations.

Language: Python - Size: 26.4 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

gregabbott/swotch

Make limited palette PNGs and SVG swatches from images. ~14KB

Language: HTML - Size: 580 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

dipampaul17/KVSplit

Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality loss. Includes benchmarking, visualization, and one-command setup. Optimized for M1/M2/M3 Macs with Metal support.

Language: Python - Size: 717 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 8 - Forks: 0

1duo/awesome-ai-infrastructures

Infrastructures™ for Machine Learning Training/Inference in Production.

Size: 11.8 MB - Last synced at: about 1 month ago - Pushed at: about 6 years ago - Stars: 416 - Forks: 74

PedroFellipeAntunes/dithering-java

Java program to apply dithering (reduce color count) to an image.

Language: Java - Size: 2.74 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

ImageOptim/libimagequant

Palette quantization library that powers pngquant and other PNG optimizers

Language: Rust - Size: 1.34 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 828 - Forks: 133

Nirusanan/Tabular_Data_Analysis-LLM

This project utilizes fine-tuned LLMs to generate Pandas code for performing financial data analytics tasks.

Language: Jupyter Notebook - Size: 276 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

kurianbenoy/Indic-Subtitler

Open source subtitling platform 💻 for transcribing and translating videos/audios in Indic languages.

Language: Jupyter Notebook - Size: 36.4 MB - Last synced at: 28 days ago - Pushed at: 2 months ago - Stars: 89 - Forks: 13

lucidrains/discrete-key-value-bottleneck-pytorch

Implementation of Discrete Key / Value Bottleneck, in Pytorch

Language: Python - Size: 196 KB - Last synced at: 8 days ago - Pushed at: almost 2 years ago - Stars: 88 - Forks: 3

Smallsan/OctQuant

Oct tree color quantization algorithm.

Language: Go - Size: 3.06 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

MatinHosseinianFard/Fault-Tolerant-Systems-Design-Project

A replication of "Enhancing Battery Thermal Management With Virtual Temperature Sensor Using Hybrid CNN-LSTM"

Language: Jupyter Notebook - Size: 56.7 MB - Last synced at: 9 days ago - Pushed at: 4 months ago - Stars: 4 - Forks: 0

singhdivyank/Data-Science-CheatSheets

Curated list of resources for Data Scientists, AI developers, and interview preperation

Language: Jupyter Notebook - Size: 322 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 6 - Forks: 0

ericmckevitt/MobileNetV2-Quantization-Benchmarking

Benchmark GPU inference performance of MobileNetV2: full-precision vs quantized (INT8) models using TensorRT

Language: Python - Size: 12.4 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

NoakLiu/LLMEasyQuant

An Easy-to-Use Toolkit for LLM Quantization on can be executed on Macbook [Efficient ML Model]

Language: Python - Size: 2.76 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 18 - Forks: 0

kemingy/rabitq

rabitq rust implementation

Language: Rust - Size: 300 KB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 10 - Forks: 0

sinanuozdemir/oreilly-hands-on-gpt-llm

Mastering the Art of Scalable and Efficient AI Model Deployment

Language: Jupyter Notebook - Size: 33.2 MB - Last synced at: 30 days ago - Pushed at: 4 months ago - Stars: 136 - Forks: 91

A-suozhang/awesome-quantization-and-fixed-point-training

Neural Network Quantization & Low-Bit Fixed Point Training For Hardware-Friendly Algorithm Design

Size: 81.1 KB - Last synced at: 1 day ago - Pushed at: over 4 years ago - Stars: 161 - Forks: 24

tpoisonooo/llama.onnx

LLaMa/RWKV onnx models, quantization and testcase

Language: Python - Size: 1.3 MB - Last synced at: 10 days ago - Pushed at: almost 2 years ago - Stars: 363 - Forks: 31

kssteven418/I-BERT

[ICML'21 Oral] I-BERT: Integer-only BERT Quantization

Language: Python - Size: 6.38 MB - Last synced at: 28 days ago - Pushed at: over 2 years ago - Stars: 246 - Forks: 36

mit-han-lab/haq

[CVPR 2019, Oral] HAQ: Hardware-Aware Automated Quantization with Mixed Precision

Language: Python - Size: 64.5 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 384 - Forks: 85

matlab-deep-learning/Quantized-Deep-Neural-Network-on-Jetson-AGX-Xavier

How to create, train and quantize network, then integrate it into pre/post image processing and generate CUDA C++ code for targeting Jetson AGX Xavier

Language: MATLAB - Size: 10.4 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 11 - Forks: 2

megvii-research/Sparsebit

A model compression and acceleration toolbox based on pytorch.

Language: Python - Size: 7.45 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 332 - Forks: 40

xvyaward/owq

Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models".

Language: Python - Size: 3.03 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 61 - Forks: 7

hkproj/quantization-notes

Notes on quantization in neural networks

Language: Jupyter Notebook - Size: 940 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 81 - Forks: 16

HosseinAtrsaei/Capacity-Bounds-for-Communication-Systems-with-Quantization-and-Spectral-Constraints

This repo analyzes capacity bounds of communication systems with quantization and spectral constraints. It includes theoretical derivations and numerical evaluations of mutual information under coarse quantization and bandwidth limitations, with applications in modern wireless systems.

Language: Jupyter Notebook - Size: 144 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

hahnyuan/PB-LLM

PB-LLM: Partially Binarized Large Language Models

Language: Python - Size: 20.7 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 152 - Forks: 10

Aisuko/notebooks

Implementation for the different ML tasks on Kaggle platform with GPUs.

Language: Jupyter Notebook - Size: 160 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 20 - Forks: 3

Aaronhuang-778/BiLLM

[ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Language: Python - Size: 1.73 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 216 - Forks: 14

onnx/neural-compressor

Model compression for ONNX

Language: Python - Size: 2.35 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 92 - Forks: 9

autohdw/QuBLAS

Quantized BLAS

Language: C++ - Size: 377 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 5 - Forks: 2

IntelLabs/distiller 📦

Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller

Language: Jupyter Notebook - Size: 40.5 MB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 4,390 - Forks: 805

koulanurag/mmn

Moore Machine Networks (MMN): Learning Finite-State Representations of Recurrent Policy Networks

Language: Python - Size: 115 MB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 50 - Forks: 13

laelhalawani/glai

glai - GGUF LLAMA AI - Package for simplified model handling and text generation with Llama models quantized to GGUF format. APIs for downloading and loading models automatically, includes a db with models of various scale and quantizations. With this high level API you need one line to load the model and one to generate text completions.

Language: Python - Size: 208 KB - Last synced at: 10 days ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 0