GitHub topics: int3
intel/neural-speed 📦
An innovative library for efficient LLM inference via low-bit quantization
Language: C++ - Size: 16.2 MB - Last synced at: 5 days ago - Pushed at: 8 months ago - Stars: 350 - Forks: 38

An innovative library for efficient LLM inference via low-bit quantization
Language: C++ - Size: 16.2 MB - Last synced at: 5 days ago - Pushed at: 8 months ago - Stars: 350 - Forks: 38