GitHub topics: int5
intel/neural-speed 📦
An innovative library for efficient LLM inference via low-bit quantization
Language: C++ - Size: 16.2 MB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 350 - Forks: 38

An innovative library for efficient LLM inference via low-bit quantization
Language: C++ - Size: 16.2 MB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 350 - Forks: 38