GitHub topics: int7
intel/neural-speed 📦
An innovative library for efficient LLM inference via low-bit quantization
Language: C++ - Size: 16.2 MB - Last synced at: 1 day ago - Pushed at: 9 months ago - Stars: 348 - Forks: 38

An innovative library for efficient LLM inference via low-bit quantization
Language: C++ - Size: 16.2 MB - Last synced at: 1 day ago - Pushed at: 9 months ago - Stars: 348 - Forks: 38