GitHub topics: neural-compressor
huggingface/optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.
Language: Python - Size: 8.3 MB - Last synced at: 4 days ago - Pushed at: 20 days ago - Stars: 303 - Forks: 58

intel/auto-round
Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Transformers, and vLLM.
Language: Python - Size: 10.9 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 497 - Forks: 41
