Topic: "mxformat"
intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Language: Python - Size: 469 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2,414 - Forks: 269

intel/neural-speed 📦
An innovative library for efficient LLM inference via low-bit quantization
Language: C++ - Size: 16.2 MB - Last synced at: 7 days ago - Pushed at: 9 months ago - Stars: 348 - Forks: 38
