GitHub / intel / neural-speed
An innovative library for efficient LLM inference via low-bit quantization
JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/intel%2Fneural-speed
Stars: 348
Forks: 38
Open issues: 26
License: apache-2.0
Language: C++
Size: 16.2 MB
Dependencies parsed at: Pending
Created at: over 1 year ago
Updated at: 4 days ago
Pushed at: 9 months ago
Last synced at: about 10 hours ago
Topics: cpu, fp4, fp8, gaudi2, gpu, int1, int2, int3, int4, int5, int6, int7, int8, llamacpp, llm-fine-tuning, llm-inference, low-bit, mxformat, nf4, sparsity