GitHub topics: int4

Advanced Quantization Algorithm for LLMs/VLMs.

Language: Python - Size: 10.4 MB - Last synced at: about 15 hours ago - Pushed at: about 15 hours ago - Stars: 439 - Forks: 35

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Language: Python - Size: 469 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 2,380 - Forks: 267

An innovative library for efficient LLM inference via low-bit quantization

Language: C++ - Size: 16.2 MB - Last synced at: 2 days ago - Pushed at: 8 months ago - Stars: 350 - Forks: 38

row-major matmul optimization

Language: C++ - Size: 12.5 MB - Last synced at: 7 months ago - Pushed at: over 1 year ago - Stars: 589 - Forks: 79

rust library to write integer types of any bit length into a buffer - from `i1` to `i64`.

Language: Rust - Size: 12.7 KB - Last synced at: 28 days ago - Pushed at: 9 months ago - Stars: 3 - Forks: 0

Related Keywords

ecosyste.ms