GitHub topics: low-precision

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Language: Python - Size: 469 MB - Last synced at: about 7 hours ago - Pushed at: about 13 hours ago - Stars: 2,380 - Forks: 267

CUDA/HIP header-only library for writing vectorized and low-precision (16 bit, 8 bit) GPU kernels

Language: C++ - Size: 7.23 MB - Last synced at: 12 days ago - Pushed at: 13 days ago - Stars: 7 - Forks: 1

Low Precision Arithmetic Simulation in PyTorch

Language: Python - Size: 246 KB - Last synced at: 12 days ago - Pushed at: 11 months ago - Stars: 274 - Forks: 75

JAX Scalify: end-to-end scaled arithmetics

Language: Python - Size: 630 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 8 - Forks: 0

LinearCosine: Adding beats multiplying for lower-precision efficient cosine similarity

Language: C++ - Size: 1.62 MB - Last synced at: 3 days ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Low Precision(quantized) Yolov5

Language: Python - Size: 9.38 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 24 - Forks: 5

A script to convert floating-point CNN models into generalized low-precision ShiftCNN representation

Language: Python - Size: 1.95 KB - Last synced at: over 1 year ago - Pushed at: almost 8 years ago - Stars: 55 - Forks: 17

Code for DNN feature map compression paper

Language: C++ - Size: 28.3 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 10 - Forks: 3

Related Keywords

ecosyste.ms