GitHub topics: low-precision
intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Language: Python - Size: 469 MB - Last synced at: about 7 hours ago - Pushed at: about 13 hours ago - Stars: 2,380 - Forks: 267

KernelTuner/kernel_float
CUDA/HIP header-only library for writing vectorized and low-precision (16 bit, 8 bit) GPU kernels
Language: C++ - Size: 7.23 MB - Last synced at: 12 days ago - Pushed at: 13 days ago - Stars: 7 - Forks: 1

Tiiiger/QPyTorch
Low Precision Arithmetic Simulation in PyTorch
Language: Python - Size: 246 KB - Last synced at: 12 days ago - Pushed at: 11 months ago - Stars: 274 - Forks: 75

graphcore-research/jax-scalify
JAX Scalify: end-to-end scaled arithmetics
Language: Python - Size: 630 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 8 - Forks: 0

AmanPriyanshu/LinearCosine
LinearCosine: Adding beats multiplying for lower-precision efficient cosine similarity
Language: C++ - Size: 1.62 MB - Last synced at: 3 days ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

sefaburakokcu/quantized-yolov5
Low Precision(quantized) Yolov5
Language: Python - Size: 9.38 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 24 - Forks: 5

gudovskiy/ShiftCNN
A script to convert floating-point CNN models into generalized low-precision ShiftCNN representation
Language: Python - Size: 1.95 KB - Last synced at: over 1 year ago - Pushed at: almost 8 years ago - Stars: 55 - Forks: 17

gudovskiy/fmap_compression
Code for DNN feature map compression paper
Language: C++ - Size: 28.3 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 10 - Forks: 3
