GitHub topics: quantization
intel/auto-round
Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Transformers, and vLLM.
Language: Python - Size: 10.8 MB - Last synced at: about 1 hour ago - Pushed at: about 2 hours ago - Stars: 518 - Forks: 42

ambv231/tinyllama-coreml-ios18-quantization
Quantize TinyLlama-1.1B-Chat from PyTorch to CoreML (float16, int8, int4) for efficient on-device inference on iOS 18+.
Language: Python - Size: 7.81 KB - Last synced at: about 2 hours ago - Pushed at: about 3 hours ago - Stars: 0 - Forks: 0

openvinotoolkit/nncf
Neural Network Compression Framework for enhanced OpenVINO™ inference
Language: Python - Size: 63.4 MB - Last synced at: about 2 hours ago - Pushed at: about 3 hours ago - Stars: 1,044 - Forks: 259

Md-Emon-Hasan/Fine-Tuning
End-to-end fine-tuning of Hugging Face models using LoRA, QLoRA, quantization, and PEFT techniques. Optimized for low-memory with efficient model deployment
Language: Jupyter Notebook - Size: 5.53 MB - Last synced at: about 3 hours ago - Pushed at: about 4 hours ago - Stars: 1 - Forks: 0

adithya-s-k/AI-Engineering.academy
Mastering Applied AI, One Concept at a Time
Language: Jupyter Notebook - Size: 96.2 MB - Last synced at: about 5 hours ago - Pushed at: about 7 hours ago - Stars: 1,007 - Forks: 112

MAGICS-LAB/GERM
[ICML 2025] Fast and Low-Cost Genomic Foundation Models via Outlier Removal.
Language: Python - Size: 21 MB - Last synced at: about 9 hours ago - Pushed at: about 10 hours ago - Stars: 12 - Forks: 2

big-nacho/patolette
off the charts color quantization 🎨
Language: C - Size: 1.53 MB - Last synced at: about 11 hours ago - Pushed at: about 12 hours ago - Stars: 142 - Forks: 1

quic/aimet
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
Language: Python - Size: 21.8 MB - Last synced at: about 15 hours ago - Pushed at: about 16 hours ago - Stars: 2,339 - Forks: 404

pytorch/ao
PyTorch native quantization and sparsity for training and inference
Language: Python - Size: 34.4 MB - Last synced at: about 17 hours ago - Pushed at: about 18 hours ago - Stars: 2,114 - Forks: 284

bitsandbytes-foundation/bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
Language: Python - Size: 3.02 MB - Last synced at: about 18 hours ago - Pushed at: about 19 hours ago - Stars: 7,142 - Forks: 706

d0tTino/DeepThought-ReThought
A refactored version of the DeepThought Discord bot, focusing on improved architecture, performance, and AI agent capabilities.
Language: Python - Size: 18.6 MB - Last synced at: about 22 hours ago - Pushed at: about 22 hours ago - Stars: 1 - Forks: 0

bonginn/llm-acceleration
The final project for EdgeAI course at NYCU, focusing on accelerating Llama-3.2-3B-Instruct inference on a single NVIDIA T4 GPU.
Language: Python - Size: 20.5 KB - Last synced at: about 24 hours ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

Grulmex/UFund-Me-Qbot
AI-powered Quantitative Investment Research Platform.
Language: HTML - Size: 16.9 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

afloresep/SPQR
SPQR (Streaming Product QuantIzation for moleculaR data): Streaming Product Quantization (PQ) for large-scale clustering of molecular data (or other high-dimensional data) to form approximate clustering in a streaming fashion, without requiring all data to be in memory at once
Language: Python - Size: 37.4 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0

datawhalechina/awesome-compression
模型压缩的小白入门教程,PDF下载地址 https://github.com/datawhalechina/awesome-compression/releases
Size: 311 MB - Last synced at: about 4 hours ago - Pushed at: 5 days ago - Stars: 295 - Forks: 36

ModelTC/TFMQ-DM
[CVPR 2024 Highlight & TPAMI 2025] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models".
Language: Jupyter Notebook - Size: 118 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 65 - Forks: 4

phun-ky/wrapture
Wrapture lets you go from a Python-trained model to deployable JavaScript with a single command. It generates TypeScript bindings and a Web/Node-compatible wrapper, using WebGPU/WASM-ready ONNX runtimes.
Language: TypeScript - Size: 1.18 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

thu-ml/SpargeAttn
SpargeAttention: A training-free sparse attention that can accelerate any model inference.
Language: Cuda - Size: 55.4 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 603 - Forks: 44

Xilinx/finn
Dataflow compiler for QNN inference on FPGAs
Language: Python - Size: 85.6 MB - Last synced at: about 20 hours ago - Pushed at: about 21 hours ago - Stars: 833 - Forks: 262

agoSantiago97/gemma-2-2b-it.cs
# gemma-2-2b-it.csThis project implements int8 CPU inference in pure C#. It ports a Rust repository using Gemini 2.5 Pro Preview, and you can easily build and run it with the provided batch files. 🐙💻
Language: C# - Size: 15.6 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

mit-han-lab/nunchaku
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Language: Python - Size: 86.1 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2,022 - Forks: 105

intel/intel-extension-for-pytorch
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
Language: Python - Size: 114 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,875 - Forks: 282

PedroFellipeAntunes/color-palette-java
Java program to apply a color palette to an image.
Language: Java - Size: 5.47 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

AutoGPTQ/AutoGPTQ 📦
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Language: Python - Size: 8.01 MB - Last synced at: 4 days ago - Pushed at: 2 months ago - Stars: 4,869 - Forks: 521

JohnClaw/gemma-2-2b-it.cs
gemma-2-2b-it int8 cpu inference in one file of pure C#
Language: C# - Size: 16.6 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

MustaphaU/Simplify-Documentation-Review-on-Atlassian-Confluence-with-LLAMA2-and-NVIDIA-TensorRT-LLM
A simple project demonstrating LLM assisted review of documentation on Atlasssian Confluence.
Language: Python - Size: 927 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

robertocenteno/wrapture
Wrapture lets you go from a Python-trained model to deployable JavaScript with a single command. It generates TypeScript bindings and a Web/Node-compatible wrapper, using WebGPU/WASM-ready ONNX runtimes.
Language: TypeScript - Size: 715 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

ModelCloud/GPTQModel
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Language: Python - Size: 12.1 MB - Last synced at: 4 days ago - Pushed at: 21 days ago - Stars: 613 - Forks: 90

PINTO0309/onnx2tf
Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.
Language: Python - Size: 3.98 MB - Last synced at: 4 days ago - Pushed at: 24 days ago - Stars: 810 - Forks: 77

Efficient-ML/Awesome-Model-Quantization
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
Size: 61.5 MB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 2,131 - Forks: 224

foundation-model-stack/fms-model-optimizer
FMS Model Optimizer is a framework for developing reduced precision neural network models.
Language: Python - Size: 14.5 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 20 - Forks: 11

fastmachinelearning/qonnx
QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX
Language: Python - Size: 5.38 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 151 - Forks: 46

sreekanth-madisetty/Awesome-LLM-Interview-Questions
Curated LLM interview questions and answers for data science and AI jobs
Size: 6.39 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2 - Forks: 0

huggingface/optimum-intel
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
Language: Jupyter Notebook - Size: 17.5 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 469 - Forks: 132

huggingface/optimum
🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
Language: Python - Size: 5.58 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2,938 - Forks: 546

zcemycl/TF2DeepFloorplan
TF2 Deep FloorPlan Recognition using a Multi-task Network with Room-boundary-Guided Attention. Enable tensorboard, quantization, flask, tflite, docker, github actions and google colab.
Language: Python - Size: 7.93 MB - Last synced at: 4 days ago - Pushed at: almost 2 years ago - Stars: 230 - Forks: 75

OpenPPL/ppq
PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
Language: Python - Size: 5.57 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 1,695 - Forks: 256

snu-mllab/GuidedQuant
Official PyTorch implementation of "GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance" (ICML 2025)
Language: Python - Size: 3.35 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 28 - Forks: 0

hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Language: Python - Size: 48.4 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 52,158 - Forks: 6,299

RWKV/rwkv.cpp
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Language: C++ - Size: 42.1 MB - Last synced at: about 13 hours ago - Pushed at: 3 months ago - Stars: 1,529 - Forks: 110

upunaprosk/quantization-effects
A curated list of papers, docs, and code on the undesired effects of model quantization, including impacts on fairness, robustness, calibration, and toxicity.
Size: 5.86 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

TarunNagarajan/TinyQuant
A focused implementation of hardware-accelerated, quantized neural network inference for embedded control systems.
Size: 142 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

mit-han-lab/ComfyUI-nunchaku
ComfyUI plugin of Nunchaku
Language: Python - Size: 2.74 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1,236 - Forks: 35

SqueezeAILab/KVQuant
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Language: Python - Size: 19.8 MB - Last synced at: 7 days ago - Pushed at: 10 months ago - Stars: 358 - Forks: 31

datawhalechina/llm-deploy
大模型/LLM推理和部署理论与实践
Size: 100 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 273 - Forks: 41

VectorDB-NTU/RaBitQ-Library
A lightweight library for the RaBitQ algorithm and its applications in vector search.
Language: C++ - Size: 1.96 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 36 - Forks: 8

intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Language: Python - Size: 469 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2,426 - Forks: 274

tensorflow/model-optimization
A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
Language: Python - Size: 2.22 MB - Last synced at: 7 days ago - Pushed at: 4 months ago - Stars: 1,536 - Forks: 328

vllm-project/llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Language: Python - Size: 28.6 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1,473 - Forks: 145

google/qkeras
QKeras: a quantization deep learning library for Tensorflow Keras
Language: Python - Size: 1.56 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 567 - Forks: 109

capel-daangn/two-armies-chat-once
💼 Work Project - 🤖🪖 A Korean-English bilingual RAG Chatbot for Regulations of US Army and ROK Army, leveraging a PEFT fine-tuned small LLM with 4-bit quantized integration as the translator
Language: Python - Size: 32.2 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

csarron/awesome-emdl
Embedded and mobile deep learning research resources
Size: 88.9 KB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 753 - Forks: 169

Picovoice/picollm
On-device LLM Inference Powered by X-Bit Quantization
Language: Python - Size: 98 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 247 - Forks: 14

huggingface/optimum-quanto
A pytorch quantization backend for optimum
Language: Python - Size: 2.71 MB - Last synced at: 6 days ago - Pushed at: 28 days ago - Stars: 950 - Forks: 73

LambdaLabsML/openquant
Simple quantization, compatible with vllm/sglang.
Language: Python - Size: 103 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1 - Forks: 0

sony/model_optimization
Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. This project provides researchers, developers, and engineers advanced quantization and compression tools for deploying state-of-the-art neural networks.
Language: Python - Size: 22.7 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 399 - Forks: 68

stochasticai/xTuring
Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6
Language: Python - Size: 18.4 MB - Last synced at: 7 days ago - Pushed at: 9 months ago - Stars: 2,653 - Forks: 201

thu-ml/SageAttention
Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
Language: Cuda - Size: 46.1 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 1,690 - Forks: 120

ymcui/Chinese-LLaMA-Alpaca
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
Language: Python - Size: 23 MB - Last synced at: 9 days ago - Pushed at: about 1 year ago - Stars: 18,851 - Forks: 1,894

Xilinx/brevitas
Brevitas: neural network quantization in PyTorch
Language: Python - Size: 37.7 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1,334 - Forks: 218

actypedef/MixedGemm
a mixed-precision gemm with quantize and reorder kernel.
Language: Cuda - Size: 25.3 MB - Last synced at: 4 days ago - Pushed at: 11 days ago - Stars: 10 - Forks: 0

open-edge-platform/training_extensions
Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™
Language: Python - Size: 416 MB - Last synced at: 10 days ago - Pushed at: 13 days ago - Stars: 1,192 - Forks: 451

666DZY666/micronet
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape
Language: Python - Size: 6.68 MB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 2,250 - Forks: 476

Inpyo-Hong/Model-Compression-Paper-List
Model Compression Paper List (Focusing on Quantization, Particularly Zero-Shot Quantization)
Size: 42 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 1 - Forks: 0

MAGICS-LAB/GenoArmory
GenoArmory: A Unified Evaluation Framework for Adversarial Attacks on Genomic Foundation Models
Language: Python - Size: 95.4 MB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 36 - Forks: 1

slinusc/fast_llm_inference
Bench360 is a modular benchmarking suite for local LLM inference. It offers a full-stack, extensible pipeline to evaluate the latency, throughput, quality, and cost of LLM inference on consumer and enterprise GPUs. Bench360 supports flexible backends, tasks and scenarios, enabling fair and reproducible comparisons for researchers and practitioners.
Language: Jupyter Notebook - Size: 752 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 3 - Forks: 0

Node0/hypercortex
A TUI based LM Swiss army knife and analysis tool
Size: 16.6 KB - Last synced at: 3 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

OpenGVLab/OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
Language: Python - Size: 8.14 MB - Last synced at: 13 days ago - Pushed at: 28 days ago - Stars: 815 - Forks: 63

ModelTC/llmc
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
Language: Python - Size: 29.8 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 481 - Forks: 55

indie/qmec
A quantum puzzle and adventure into Native Language decolonization; features an introduction to the master quantum plane and the truthful history of indigenous peoples on Turtle Island. Not G-rated.
Language: C++ - Size: 44.4 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 2 - Forks: 1

mobiusml/hqq
Official implementation of Half-Quadratic Quantization (HQQ)
Language: Python - Size: 558 KB - Last synced at: 11 days ago - Pushed at: 17 days ago - Stars: 821 - Forks: 79

RahulSChand/gpu_poor
Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization
Language: JavaScript - Size: 1.56 MB - Last synced at: 7 days ago - Pushed at: 7 months ago - Stars: 1,310 - Forks: 75

7abushahla/Sony-Spresense-TFLite-Guide
This GitHub repository provides a detailed guide for deploying TensorFlow Lite (TFLite) models on the Sony Spresense board.
Size: 530 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 1 - Forks: 0

7abushahla/Student-Engagement
Code and resources for the paper: Real-Time Student Engagement Monitoring on Edge Devices: Deep Learning Meets Efficiency and Privacy
Language: C - Size: 29.9 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

eki-project/finn-plus Fork of Xilinx/finn
FINN+ is an extended version of FINN, a dataflow compiler for QNN inference on FPGAs. It is maintained by a group of researchers at Paderborn University, Germany.
Language: Python - Size: 89.2 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 19 - Forks: 3

gaoj0017/RaBitQ
[SIGMOD 2024] RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search
Language: C++ - Size: 1.63 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 108 - Forks: 15

VectorDB-NTU/Extended-RaBitQ
[SIGMOD 2025] Practical and Asymptotically Optimal Quantization of High-Dimensional Vectors in Euclidean Space for Approximate Nearest Neighbor Search
Language: C++ - Size: 57.6 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 37 - Forks: 6

THU-MIG/torch-model-compression
针对pytorch模型的自动化模型结构分析和修改工具集,包含自动分析模型结构的模型压缩算法库
Language: Python - Size: 132 KB - Last synced at: 3 days ago - Pushed at: about 2 years ago - Stars: 249 - Forks: 41

PaddlePaddle/PaddleSlim
PaddleSlim is an open-source library for deep model compression and architecture search.
Language: Python - Size: 16.3 MB - Last synced at: about 9 hours ago - Pushed at: 7 months ago - Stars: 1,593 - Forks: 350

vanhai1231/autoquant-infer
Công cụ giảm kích thước mô hình bằng Quantization, kết hợp AI Agent để tự động chọn mức tối ưu, giúp tăng tốc và tiết kiệm chi phí inference.
Language: Python - Size: 54.7 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

codewithdark-git/QuantLLM
QuantLLM is a Python library designed for developers, researchers, and teams who want to fine-tune and deploy large language models (LLMs) efficiently using 4-bit and 8-bit quantization techniques.
Language: Python - Size: 294 KB - Last synced at: 10 days ago - Pushed at: 16 days ago - Stars: 5 - Forks: 0

edcalderin/HuggingFace_RAGFlow
This project implements a classic Retrieval-Augmented Generation (RAG) system using HuggingFace models with quantization techniques. The system processes PDF documents, extracts their content, and enables interactive question-answering through a Streamlit web application.
Language: Python - Size: 114 KB - Last synced at: 12 days ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

FasterDecoding/BitDelta
Language: Jupyter Notebook - Size: 7.21 MB - Last synced at: 9 days ago - Pushed at: 7 months ago - Stars: 198 - Forks: 15

SqueezeAILab/SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
Language: Python - Size: 1.5 MB - Last synced at: 7 days ago - Pushed at: 10 months ago - Stars: 691 - Forks: 45

quic/aimet-pages
AIMET GitHub pages documentation
Language: HTML - Size: 44.6 MB - Last synced at: 4 days ago - Pushed at: 17 days ago - Stars: 8 - Forks: 4

SYSTRAN/faster-whisper
Faster Whisper transcription with CTranslate2
Language: Python - Size: 36.6 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 16,342 - Forks: 1,354

stdlib-js/ml-incr-kmeans
Incrementally partition data into `k` clusters.
Language: JavaScript - Size: 3.27 MB - Last synced at: 17 days ago - Pushed at: 18 days ago - Stars: 5 - Forks: 0

HosseinAtrsaei/MIMO-Networks-With-One-Bit-ADCs-Receiver-Design-and-Communication-Strategies
Simulation and implementation of hybrid blockwise and adaptive threshold receiver architectures for MIMO systems with one-bit ADCs, based on the paper: "MIMO Networks With One-Bit ADCs: Receiver Design and Communication Strategies", IEEE Trans. on Communications, vol. 70, no. 3, Mar. 2022.
Language: Jupyter Notebook - Size: 300 KB - Last synced at: 17 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

iriskaplan/LatticeQuant
Implementation of M-leveled Quantizer and Voronoi Code Quantizer.
Language: Python - Size: 1.85 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 1 - Forks: 0

0xMartin/BMPEditor
A simple BMP image viewer, converter and editor. App is primarily focused on implementation of own code for working with BMP images
Language: C++ - Size: 23.1 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 9 - Forks: 1

guoriyue/PKU-Data-Mining-2022-TA
Language: Jupyter Notebook - Size: 3.9 MB - Last synced at: 14 days ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 1

mit-han-lab/TinyChatEngine
TinyChatEngine: On-Device LLM Inference Library
Language: C++ - Size: 83.3 MB - Last synced at: 20 days ago - Pushed at: 12 months ago - Stars: 852 - Forks: 85

LowinLi/stable-diffusion-streamlit
Quantized stable-diffusion cutting down memory 75%, testing in streamlit, deploying in container
Language: Python - Size: 53.6 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 53 - Forks: 7

inisis/brocolli
Everything in Torch Fx
Language: Python - Size: 5.9 MB - Last synced at: 19 days ago - Pushed at: about 1 year ago - Stars: 343 - Forks: 61

jy-yuan/KIVI
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
Language: Python - Size: 16.7 MB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 303 - Forks: 31

open-mmlab/mmrazor
OpenMMLab Model Compression Toolbox and Benchmark.
Language: Python - Size: 11.1 MB - Last synced at: 22 days ago - Pushed at: about 1 year ago - Stars: 1,599 - Forks: 236

huawei-noah/Pretrained-Language-Model
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
Language: Python - Size: 29 MB - Last synced at: 22 days ago - Pushed at: over 1 year ago - Stars: 3,099 - Forks: 637

raywan-110/AdaQP
Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training
Language: Python - Size: 97.7 KB - Last synced at: 19 days ago - Pushed at: over 1 year ago - Stars: 24 - Forks: 3

vbdi/casp
[CVPR 2025] CASP: Compression of Large Multimodal Models Based on Attention Sparsity
Language: Python - Size: 764 KB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 4 - Forks: 1

neuralmagic/sparsezoo
Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes
Language: Python - Size: 1.33 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 386 - Forks: 28
