GitHub topics: int4
intel/auto-round
Advanced Quantization Algorithm for LLMs/VLMs.
Language: Python - Size: 10.4 MB - Last synced at: about 15 hours ago - Pushed at: about 15 hours ago - Stars: 439 - Forks: 35

intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Language: Python - Size: 469 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 2,380 - Forks: 267

intel/neural-speed 📦
An innovative library for efficient LLM inference via low-bit quantization
Language: C++ - Size: 16.2 MB - Last synced at: 2 days ago - Pushed at: 8 months ago - Stars: 350 - Forks: 38

tpoisonooo/how-to-optimize-gemm
row-major matmul optimization
Language: C++ - Size: 12.5 MB - Last synced at: 7 months ago - Pushed at: over 1 year ago - Stars: 589 - Forks: 79

Danaozhong/rust-bitwriter
rust library to write integer types of any bit length into a buffer - from `i1` to `i64`.
Language: Rust - Size: 12.7 KB - Last synced at: 28 days ago - Pushed at: 9 months ago - Stars: 3 - Forks: 0
