GitHub / intel / neural-speed
An innovative library for efficient LLM inference via low-bit quantization
JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/intel%2Fneural-speed
Stars: 350
Forks: 38
Open issues: 26
License: apache-2.0
Language: C++
Size: 16.2 MB
Dependencies parsed at: Pending
Created at: over 1 year ago
Updated at: 16 days ago
Pushed at: 8 months ago
Last synced at: 10 days ago
Topics: cpu, fp4, fp8, gaudi2, gpu, int1, int2, int3, int4, int5, int6, int7, int8, llamacpp, llm-fine-tuning, llm-inference, low-bit, mxformat, nf4, sparsity