GitHub topics: half-precision
petamoriken/float16
Stage 3 IEEE 754 half-precision floating-point ponyfill
Language: JavaScript - Size: 9.41 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 101 - Forks: 8

stillwater-sc/universal
Large collection of number systems providing custom arithmetic for mixed-precision algorithm development and optimization for AI, Machine Learning, Computer Vision, Signal Processing, CAE, EDA, control, optimization, estimation, and approximation.
Language: C++ - Size: 118 MB - Last synced at: 2 days ago - Pushed at: 7 days ago - Stars: 440 - Forks: 62

enp1s0/cuMpSGEMM
Fast SGEMM emulation on Tensor Cores
Language: Cuda - Size: 476 KB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 11 - Forks: 1

x448/float16
float16 provides IEEE 754 half-precision format (binary16) with correct conversions to/from float32
Language: Go - Size: 181 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 75 - Forks: 8

higham/chop
Round matrix elements to lower precision in MATLAB
Language: MATLAB - Size: 52.7 KB - Last synced at: 1 day ago - Pushed at: almost 3 years ago - Stars: 37 - Forks: 11

stdlib-js/constants-float16-sqrt-eps
Square root of half-precision floating-point epsilon.
Language: JavaScript - Size: 325 KB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 2 - Forks: 0

stdlib-js/constants-float16-num-bytes
Size (in bytes) of a half-precision floating-point number.
Language: JavaScript - Size: 313 KB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 2 - Forks: 0

stdlib-js/constants-float16-cbrt-eps
Cube root of half-precision floating-point epsilon.
Language: JavaScript - Size: 319 KB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 1 - Forks: 0

Maratyszcza/FP16
Conversion to/from half-precision floating point formats
Language: C++ - Size: 127 KB - Last synced at: 24 days ago - Pushed at: 10 months ago - Stars: 347 - Forks: 96

shibatch/tlfloat
C++ template library for floating point operations
Language: C++ - Size: 674 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 26 - Forks: 2

KernelTuner/kernel_float
CUDA/HIP header-only library for writing vectorized and low-precision (16 bit, 8 bit) GPU kernels
Language: C++ - Size: 7.23 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 7 - Forks: 1

SomeRandomiOSDev/Half
Swift Half-Precision Floating Point
Language: Swift - Size: 209 KB - Last synced at: 12 days ago - Pushed at: over 1 year ago - Stars: 11 - Forks: 3

joeltg/fp16
Half-precision 16-bit floating point numbers
Language: TypeScript - Size: 476 KB - Last synced at: about 2 hours ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

artyom-beilis/float16
half float library for C and for z80
Language: C - Size: 22.5 KB - Last synced at: 2 months ago - Pushed at: over 5 years ago - Stars: 34 - Forks: 7

stdlib-js/constants-float16
Half-precision floating-point mathematical constants.
Language: JavaScript - Size: 616 KB - Last synced at: about 1 hour ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

oleks/binary16
Emulating binary, half-precision IEEE-754 (2008) floats
Language: C - Size: 29.3 KB - Last synced at: about 1 month ago - Pushed at: about 8 years ago - Stars: 2 - Forks: 0

DivergentClouds/subleq-linear
An implementation of the Subleq OISC using only linear operations on half-precision (16 bit) IEEE-754 floats (and a loop).
Language: Zig - Size: 18.6 KB - Last synced at: 2 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

canbula/ieee754
Python module which finds the IEEE-754 representation of a floating point number.
Language: Python - Size: 85.9 KB - Last synced at: 12 days ago - Pushed at: about 1 year ago - Stars: 27 - Forks: 5

yowidin/fast-half-float
Fast Half precision Floating point operations for C++
Language: C++ - Size: 8.79 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

bindog/pytorch-model-parallel
A memory balanced and communication efficient FullyConnected layer with CrossEntropyLoss model parallel implementation in PyTorch
Language: Python - Size: 85 KB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 74 - Forks: 20

nitronoid/floatingPoint
Language: C++ - Size: 29.3 KB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 2 - Forks: 0

jamesalbert/halfprec
Half-precision assembly interface for C
Language: Assembly - Size: 9.77 KB - Last synced at: 3 months ago - Pushed at: about 8 years ago - Stars: 1 - Forks: 0

minhhn2910/cuda-half2
Convert CUDA programs from float data type to half or half2 with SIMDization
Language: C++ - Size: 144 MB - Last synced at: 2 days ago - Pushed at: almost 6 years ago - Stars: 20 - Forks: 6

hma02/cublasHgemm-P100
Code for testing the native float16 matrix multiplication performance on Tesla P100 and V100 GPU based on cublasHgemm
Language: Cuda - Size: 18.6 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 35 - Forks: 11

steven-varga/h5cpp
C++17 templates between [stl::vector | armadillo | eigen3 | ublas | blitz++] and HDF5 datasets
Language: C++ - Size: 21.9 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 139 - Forks: 32

fengwang/float16_t Fork of acgessler/half_float
CPP20 implementation of a 16-bit floating-point type mimicking most of the IEEE 754 behavior. Single file and header-only.
Language: C++ - Size: 204 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 19 - Forks: 5

DW0RKiN/Floating-point-Library-for-Z80
Floating-Point Arithmetic Library for Z80
Language: Assembly - Size: 8.32 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 19 - Forks: 3

imciner2/ChopBLAS
Basic linear algebra routines implemented using the chop rounding function
Language: MATLAB - Size: 1.7 MB - Last synced at: 16 days ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 1

neslib/Neslib.Half
Half-Precision Floating-Point for Delphi
Language: Pascal - Size: 65.4 KB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 9 - Forks: 3

enp1s0/curand_fp16
FP16 pseudo random number generator on GPU
Language: Cuda - Size: 32.2 KB - Last synced at: about 6 hours ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

georgy7/toyfloat
A library that encodes 3 to 16 bits wide floating-point numbers.
Language: Go - Size: 1.52 MB - Last synced at: about 2 months ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

LeDuy-Vu/CS-147-Project
Implement arithmetic operations to handle half-precision numbers in MIPS instructions.
Language: Assembly - Size: 23.4 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

dyeo/dym
The DYM Math Library for Graphics and Game Programming
Language: C++ - Size: 440 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 2

jizhuoran/caffe-android-opencl-fp16
Optimised Caffe with OpenCL supporting for less powerful devices such as mobile phones
Language: C++ - Size: 50.3 MB - Last synced at: almost 2 years ago - Pushed at: about 6 years ago - Stars: 17 - Forks: 3
