An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: quantization

intel/auto-round

Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Transformers, and vLLM.

Language: Python - Size: 10.8 MB - Last synced at: about 1 hour ago - Pushed at: about 2 hours ago - Stars: 518 - Forks: 42

ambv231/tinyllama-coreml-ios18-quantization

Quantize TinyLlama-1.1B-Chat from PyTorch to CoreML (float16, int8, int4) for efficient on-device inference on iOS 18+.

Language: Python - Size: 7.81 KB - Last synced at: about 2 hours ago - Pushed at: about 3 hours ago - Stars: 0 - Forks: 0

openvinotoolkit/nncf

Neural Network Compression Framework for enhanced OpenVINO™ inference

Language: Python - Size: 63.4 MB - Last synced at: about 2 hours ago - Pushed at: about 3 hours ago - Stars: 1,044 - Forks: 259

Md-Emon-Hasan/Fine-Tuning

End-to-end fine-tuning of Hugging Face models using LoRA, QLoRA, quantization, and PEFT techniques. Optimized for low-memory with efficient model deployment

Language: Jupyter Notebook - Size: 5.53 MB - Last synced at: about 3 hours ago - Pushed at: about 4 hours ago - Stars: 1 - Forks: 0

adithya-s-k/AI-Engineering.academy

Mastering Applied AI, One Concept at a Time

Language: Jupyter Notebook - Size: 96.2 MB - Last synced at: about 5 hours ago - Pushed at: about 7 hours ago - Stars: 1,007 - Forks: 112

MAGICS-LAB/GERM

[ICML 2025] Fast and Low-Cost Genomic Foundation Models via Outlier Removal.

Language: Python - Size: 21 MB - Last synced at: about 9 hours ago - Pushed at: about 10 hours ago - Stars: 12 - Forks: 2

big-nacho/patolette

off the charts color quantization 🎨

Language: C - Size: 1.53 MB - Last synced at: about 11 hours ago - Pushed at: about 12 hours ago - Stars: 142 - Forks: 1

quic/aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

Language: Python - Size: 21.8 MB - Last synced at: about 15 hours ago - Pushed at: about 16 hours ago - Stars: 2,339 - Forks: 404

pytorch/ao

PyTorch native quantization and sparsity for training and inference

Language: Python - Size: 34.4 MB - Last synced at: about 17 hours ago - Pushed at: about 18 hours ago - Stars: 2,114 - Forks: 284

bitsandbytes-foundation/bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

Language: Python - Size: 3.02 MB - Last synced at: about 18 hours ago - Pushed at: about 19 hours ago - Stars: 7,142 - Forks: 706

d0tTino/DeepThought-ReThought

A refactored version of the DeepThought Discord bot, focusing on improved architecture, performance, and AI agent capabilities.

Language: Python - Size: 18.6 MB - Last synced at: about 22 hours ago - Pushed at: about 22 hours ago - Stars: 1 - Forks: 0

bonginn/llm-acceleration

The final project for EdgeAI course at NYCU, focusing on accelerating Llama-3.2-3B-Instruct inference on a single NVIDIA T4 GPU.

Language: Python - Size: 20.5 KB - Last synced at: about 24 hours ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

Grulmex/UFund-Me-Qbot

AI-powered Quantitative Investment Research Platform.

Language: HTML - Size: 16.9 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

afloresep/SPQR

SPQR (Streaming Product QuantIzation for moleculaR data): Streaming Product Quantization (PQ) for large-scale clustering of molecular data (or other high-dimensional data) to form approximate clustering in a streaming fashion, without requiring all data to be in memory at once

Language: Python - Size: 37.4 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0

datawhalechina/awesome-compression

模型压缩的小白入门教程,PDF下载地址 https://github.com/datawhalechina/awesome-compression/releases

Size: 311 MB - Last synced at: about 4 hours ago - Pushed at: 5 days ago - Stars: 295 - Forks: 36

ModelTC/TFMQ-DM

[CVPR 2024 Highlight & TPAMI 2025] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models".

Language: Jupyter Notebook - Size: 118 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 65 - Forks: 4

phun-ky/wrapture

Wrapture lets you go from a Python-trained model to deployable JavaScript with a single command. It generates TypeScript bindings and a Web/Node-compatible wrapper, using WebGPU/WASM-ready ONNX runtimes.

Language: TypeScript - Size: 1.18 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

thu-ml/SpargeAttn

SpargeAttention: A training-free sparse attention that can accelerate any model inference.

Language: Cuda - Size: 55.4 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 603 - Forks: 44

Xilinx/finn

Dataflow compiler for QNN inference on FPGAs

Language: Python - Size: 85.6 MB - Last synced at: about 20 hours ago - Pushed at: about 21 hours ago - Stars: 833 - Forks: 262

agoSantiago97/gemma-2-2b-it.cs

# gemma-2-2b-it.csThis project implements int8 CPU inference in pure C#. It ports a Rust repository using Gemini 2.5 Pro Preview, and you can easily build and run it with the provided batch files. 🐙💻

Language: C# - Size: 15.6 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

mit-han-lab/nunchaku

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Language: Python - Size: 86.1 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2,022 - Forks: 105

intel/intel-extension-for-pytorch

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

Language: Python - Size: 114 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,875 - Forks: 282

PedroFellipeAntunes/color-palette-java

Java program to apply a color palette to an image.

Language: Java - Size: 5.47 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

AutoGPTQ/AutoGPTQ 📦

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language: Python - Size: 8.01 MB - Last synced at: 4 days ago - Pushed at: 2 months ago - Stars: 4,869 - Forks: 521

JohnClaw/gemma-2-2b-it.cs

gemma-2-2b-it int8 cpu inference in one file of pure C#

Language: C# - Size: 16.6 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

MustaphaU/Simplify-Documentation-Review-on-Atlassian-Confluence-with-LLAMA2-and-NVIDIA-TensorRT-LLM

A simple project demonstrating LLM assisted review of documentation on Atlasssian Confluence.

Language: Python - Size: 927 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

robertocenteno/wrapture

Wrapture lets you go from a Python-trained model to deployable JavaScript with a single command. It generates TypeScript bindings and a Web/Node-compatible wrapper, using WebGPU/WASM-ready ONNX runtimes.

Language: TypeScript - Size: 715 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

ModelCloud/GPTQModel

Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

Language: Python - Size: 12.1 MB - Last synced at: 4 days ago - Pushed at: 21 days ago - Stars: 613 - Forks: 90

PINTO0309/onnx2tf

Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.

Language: Python - Size: 3.98 MB - Last synced at: 4 days ago - Pushed at: 24 days ago - Stars: 810 - Forks: 77

Efficient-ML/Awesome-Model-Quantization

A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

Size: 61.5 MB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 2,131 - Forks: 224

foundation-model-stack/fms-model-optimizer

FMS Model Optimizer is a framework for developing reduced precision neural network models.

Language: Python - Size: 14.5 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 20 - Forks: 11

fastmachinelearning/qonnx

QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX

Language: Python - Size: 5.38 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 151 - Forks: 46

sreekanth-madisetty/Awesome-LLM-Interview-Questions

Curated LLM interview questions and answers for data science and AI jobs

Size: 6.39 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2 - Forks: 0

huggingface/optimum-intel

🤗 Optimum Intel: Accelerate inference with Intel optimization tools

Language: Jupyter Notebook - Size: 17.5 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 469 - Forks: 132

huggingface/optimum

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

Language: Python - Size: 5.58 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2,938 - Forks: 546

zcemycl/TF2DeepFloorplan

TF2 Deep FloorPlan Recognition using a Multi-task Network with Room-boundary-Guided Attention. Enable tensorboard, quantization, flask, tflite, docker, github actions and google colab.

Language: Python - Size: 7.93 MB - Last synced at: 4 days ago - Pushed at: almost 2 years ago - Stars: 230 - Forks: 75

OpenPPL/ppq

PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.

Language: Python - Size: 5.57 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 1,695 - Forks: 256

snu-mllab/GuidedQuant

Official PyTorch implementation of "GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance" (ICML 2025)

Language: Python - Size: 3.35 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 28 - Forks: 0

hiyouga/LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Language: Python - Size: 48.4 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 52,158 - Forks: 6,299

RWKV/rwkv.cpp

INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model

Language: C++ - Size: 42.1 MB - Last synced at: about 13 hours ago - Pushed at: 3 months ago - Stars: 1,529 - Forks: 110

upunaprosk/quantization-effects

A curated list of papers, docs, and code on the undesired effects of model quantization, including impacts on fairness, robustness, calibration, and toxicity.

Size: 5.86 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

TarunNagarajan/TinyQuant

A focused implementation of hardware-accelerated, quantized neural network inference for embedded control systems.

Size: 142 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

mit-han-lab/ComfyUI-nunchaku

ComfyUI plugin of Nunchaku

Language: Python - Size: 2.74 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1,236 - Forks: 35

SqueezeAILab/KVQuant

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Language: Python - Size: 19.8 MB - Last synced at: 7 days ago - Pushed at: 10 months ago - Stars: 358 - Forks: 31

datawhalechina/llm-deploy

大模型/LLM推理和部署理论与实践

Size: 100 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 273 - Forks: 41

VectorDB-NTU/RaBitQ-Library

A lightweight library for the RaBitQ algorithm and its applications in vector search.

Language: C++ - Size: 1.96 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 36 - Forks: 8

intel/neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Language: Python - Size: 469 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2,426 - Forks: 274

tensorflow/model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

Language: Python - Size: 2.22 MB - Last synced at: 7 days ago - Pushed at: 4 months ago - Stars: 1,536 - Forks: 328

vllm-project/llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Language: Python - Size: 28.6 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1,473 - Forks: 145

google/qkeras

QKeras: a quantization deep learning library for Tensorflow Keras

Language: Python - Size: 1.56 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 567 - Forks: 109

capel-daangn/two-armies-chat-once

💼 Work Project - 🤖🪖 A Korean-English bilingual RAG Chatbot for Regulations of US Army and ROK Army, leveraging a PEFT fine-tuned small LLM with 4-bit quantized integration as the translator

Language: Python - Size: 32.2 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

csarron/awesome-emdl

Embedded and mobile deep learning research resources

Size: 88.9 KB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 753 - Forks: 169

Picovoice/picollm

On-device LLM Inference Powered by X-Bit Quantization

Language: Python - Size: 98 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 247 - Forks: 14

huggingface/optimum-quanto

A pytorch quantization backend for optimum

Language: Python - Size: 2.71 MB - Last synced at: 6 days ago - Pushed at: 28 days ago - Stars: 950 - Forks: 73

LambdaLabsML/openquant

Simple quantization, compatible with vllm/sglang.

Language: Python - Size: 103 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1 - Forks: 0

sony/model_optimization

Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. This project provides researchers, developers, and engineers advanced quantization and compression tools for deploying state-of-the-art neural networks.

Language: Python - Size: 22.7 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 399 - Forks: 68

stochasticai/xTuring

Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6

Language: Python - Size: 18.4 MB - Last synced at: 7 days ago - Pushed at: 9 months ago - Stars: 2,653 - Forks: 201

thu-ml/SageAttention

Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.

Language: Cuda - Size: 46.1 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 1,690 - Forks: 120

ymcui/Chinese-LLaMA-Alpaca

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

Language: Python - Size: 23 MB - Last synced at: 9 days ago - Pushed at: about 1 year ago - Stars: 18,851 - Forks: 1,894

Xilinx/brevitas

Brevitas: neural network quantization in PyTorch

Language: Python - Size: 37.7 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1,334 - Forks: 218

actypedef/MixedGemm

a mixed-precision gemm with quantize and reorder kernel.

Language: Cuda - Size: 25.3 MB - Last synced at: 4 days ago - Pushed at: 11 days ago - Stars: 10 - Forks: 0

open-edge-platform/training_extensions

Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™

Language: Python - Size: 416 MB - Last synced at: 10 days ago - Pushed at: 13 days ago - Stars: 1,192 - Forks: 451

666DZY666/micronet

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape

Language: Python - Size: 6.68 MB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 2,250 - Forks: 476

Inpyo-Hong/Model-Compression-Paper-List

Model Compression Paper List (Focusing on Quantization, Particularly Zero-Shot Quantization)

Size: 42 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 1 - Forks: 0

MAGICS-LAB/GenoArmory

GenoArmory: A Unified Evaluation Framework for Adversarial Attacks on Genomic Foundation Models

Language: Python - Size: 95.4 MB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 36 - Forks: 1

slinusc/fast_llm_inference

Bench360 is a modular benchmarking suite for local LLM inference. It offers a full-stack, extensible pipeline to evaluate the latency, throughput, quality, and cost of LLM inference on consumer and enterprise GPUs. Bench360 supports flexible backends, tasks and scenarios, enabling fair and reproducible comparisons for researchers and practitioners.

Language: Jupyter Notebook - Size: 752 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 3 - Forks: 0

Node0/hypercortex

A TUI based LM Swiss army knife and analysis tool

Size: 16.6 KB - Last synced at: 3 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

OpenGVLab/OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Language: Python - Size: 8.14 MB - Last synced at: 13 days ago - Pushed at: 28 days ago - Stars: 815 - Forks: 63

ModelTC/llmc

[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

Language: Python - Size: 29.8 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 481 - Forks: 55

indie/qmec

A quantum puzzle and adventure into Native Language decolonization; features an introduction to the master quantum plane and the truthful history of indigenous peoples on Turtle Island. Not G-rated.

Language: C++ - Size: 44.4 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 2 - Forks: 1

mobiusml/hqq

Official implementation of Half-Quadratic Quantization (HQQ)

Language: Python - Size: 558 KB - Last synced at: 11 days ago - Pushed at: 17 days ago - Stars: 821 - Forks: 79

RahulSChand/gpu_poor

Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization

Language: JavaScript - Size: 1.56 MB - Last synced at: 7 days ago - Pushed at: 7 months ago - Stars: 1,310 - Forks: 75

7abushahla/Sony-Spresense-TFLite-Guide

This GitHub repository provides a detailed guide for deploying TensorFlow Lite (TFLite) models on the Sony Spresense board.

Size: 530 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 1 - Forks: 0

7abushahla/Student-Engagement

Code and resources for the paper: Real-Time Student Engagement Monitoring on Edge Devices: Deep Learning Meets Efficiency and Privacy

Language: C - Size: 29.9 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

eki-project/finn-plus Fork of Xilinx/finn

FINN+ is an extended version of FINN, a dataflow compiler for QNN inference on FPGAs. It is maintained by a group of researchers at Paderborn University, Germany.

Language: Python - Size: 89.2 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 19 - Forks: 3

gaoj0017/RaBitQ

[SIGMOD 2024] RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search

Language: C++ - Size: 1.63 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 108 - Forks: 15

VectorDB-NTU/Extended-RaBitQ

[SIGMOD 2025] Practical and Asymptotically Optimal Quantization of High-Dimensional Vectors in Euclidean Space for Approximate Nearest Neighbor Search

Language: C++ - Size: 57.6 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 37 - Forks: 6

THU-MIG/torch-model-compression

针对pytorch模型的自动化模型结构分析和修改工具集,包含自动分析模型结构的模型压缩算法库

Language: Python - Size: 132 KB - Last synced at: 3 days ago - Pushed at: about 2 years ago - Stars: 249 - Forks: 41

PaddlePaddle/PaddleSlim

PaddleSlim is an open-source library for deep model compression and architecture search.

Language: Python - Size: 16.3 MB - Last synced at: about 9 hours ago - Pushed at: 7 months ago - Stars: 1,593 - Forks: 350

vanhai1231/autoquant-infer

Công cụ giảm kích thước mô hình bằng Quantization, kết hợp AI Agent để tự động chọn mức tối ưu, giúp tăng tốc và tiết kiệm chi phí inference.

Language: Python - Size: 54.7 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

codewithdark-git/QuantLLM

QuantLLM is a Python library designed for developers, researchers, and teams who want to fine-tune and deploy large language models (LLMs) efficiently using 4-bit and 8-bit quantization techniques.

Language: Python - Size: 294 KB - Last synced at: 10 days ago - Pushed at: 16 days ago - Stars: 5 - Forks: 0

edcalderin/HuggingFace_RAGFlow

This project implements a classic Retrieval-Augmented Generation (RAG) system using HuggingFace models with quantization techniques. The system processes PDF documents, extracts their content, and enables interactive question-answering through a Streamlit web application.

Language: Python - Size: 114 KB - Last synced at: 12 days ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

FasterDecoding/BitDelta

Language: Jupyter Notebook - Size: 7.21 MB - Last synced at: 9 days ago - Pushed at: 7 months ago - Stars: 198 - Forks: 15

SqueezeAILab/SqueezeLLM

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

Language: Python - Size: 1.5 MB - Last synced at: 7 days ago - Pushed at: 10 months ago - Stars: 691 - Forks: 45

quic/aimet-pages

AIMET GitHub pages documentation

Language: HTML - Size: 44.6 MB - Last synced at: 4 days ago - Pushed at: 17 days ago - Stars: 8 - Forks: 4

SYSTRAN/faster-whisper

Faster Whisper transcription with CTranslate2

Language: Python - Size: 36.6 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 16,342 - Forks: 1,354

stdlib-js/ml-incr-kmeans

Incrementally partition data into `k` clusters.

Language: JavaScript - Size: 3.27 MB - Last synced at: 17 days ago - Pushed at: 18 days ago - Stars: 5 - Forks: 0

HosseinAtrsaei/MIMO-Networks-With-One-Bit-ADCs-Receiver-Design-and-Communication-Strategies

Simulation and implementation of hybrid blockwise and adaptive threshold receiver architectures for MIMO systems with one-bit ADCs, based on the paper: "MIMO Networks With One-Bit ADCs: Receiver Design and Communication Strategies", IEEE Trans. on Communications, vol. 70, no. 3, Mar. 2022.

Language: Jupyter Notebook - Size: 300 KB - Last synced at: 17 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

iriskaplan/LatticeQuant

Implementation of M-leveled Quantizer and Voronoi Code Quantizer.

Language: Python - Size: 1.85 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 1 - Forks: 0

0xMartin/BMPEditor

A simple BMP image viewer, converter and editor. App is primarily focused on implementation of own code for working with BMP images

Language: C++ - Size: 23.1 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 9 - Forks: 1

guoriyue/PKU-Data-Mining-2022-TA

Language: Jupyter Notebook - Size: 3.9 MB - Last synced at: 14 days ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 1

mit-han-lab/TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library

Language: C++ - Size: 83.3 MB - Last synced at: 20 days ago - Pushed at: 12 months ago - Stars: 852 - Forks: 85

LowinLi/stable-diffusion-streamlit

Quantized stable-diffusion cutting down memory 75%, testing in streamlit, deploying in container

Language: Python - Size: 53.6 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 53 - Forks: 7

inisis/brocolli

Everything in Torch Fx

Language: Python - Size: 5.9 MB - Last synced at: 19 days ago - Pushed at: about 1 year ago - Stars: 343 - Forks: 61

jy-yuan/KIVI

[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Language: Python - Size: 16.7 MB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 303 - Forks: 31

open-mmlab/mmrazor

OpenMMLab Model Compression Toolbox and Benchmark.

Language: Python - Size: 11.1 MB - Last synced at: 22 days ago - Pushed at: about 1 year ago - Stars: 1,599 - Forks: 236

huawei-noah/Pretrained-Language-Model

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

Language: Python - Size: 29 MB - Last synced at: 22 days ago - Pushed at: over 1 year ago - Stars: 3,099 - Forks: 637

raywan-110/AdaQP

Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training

Language: Python - Size: 97.7 KB - Last synced at: 19 days ago - Pushed at: over 1 year ago - Stars: 24 - Forks: 3

vbdi/casp

[CVPR 2025] CASP: Compression of Large Multimodal Models Based on Attention Sparsity

Language: Python - Size: 764 KB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 4 - Forks: 1

neuralmagic/sparsezoo

Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes

Language: Python - Size: 1.33 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 386 - Forks: 28