GitHub topics: tensorcore

FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme

Language: Cuda - Size: 186 KB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 70 - Forks: 5

Fast SGEMM emulation on Tensor Cores

Language: Cuda - Size: 476 KB - Last synced at: 7 days ago - Pushed at: 4 months ago - Stars: 12 - Forks: 1

An extension library of WMMA API (Tensor Core API)

Language: Cuda - Size: 698 KB - Last synced at: 7 days ago - Pushed at: 11 months ago - Stars: 97 - Forks: 15

Microarchitecture implementation of Nvidia's Tensor Cores

Language: Verilog - Size: 17.9 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Fast Kernel SVM on TensorCore enabled GPU

Language: Cuda - Size: 140 KB - Last synced at: 5 months ago - Pushed at: almost 3 years ago - Stars: 6 - Forks: 1

Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.

Language: Python - Size: 691 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 361 - Forks: 80

Compare the different runtime of CNN computation on CPU and GPU

Language: C++ - Size: 6.76 MB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 2 - Forks: 0

Artifact for SC21: APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores.

Size: 67.6 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

Related Keywords

ecosyste.ms