GitHub topics: tensorcore
enp1s0/ozIMMU
FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme
Language: Cuda - Size: 186 KB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 70 - Forks: 5

enp1s0/cuMpSGEMM
Fast SGEMM emulation on Tensor Cores
Language: Cuda - Size: 476 KB - Last synced at: 7 days ago - Pushed at: 4 months ago - Stars: 12 - Forks: 1

wmmae/wmma_extension
An extension library of WMMA API (Tensor Core API)
Language: Cuda - Size: 698 KB - Last synced at: 7 days ago - Pushed at: 11 months ago - Stars: 97 - Forks: 15

NikhilRout/TheTensorCoreProject
Microarchitecture implementation of Nvidia's Tensor Cores
Language: Verilog - Size: 17.9 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

robbwu/tensorsvm
Fast Kernel SVM on TensorCore enabled GPU
Language: Cuda - Size: 140 KB - Last synced at: 5 months ago - Pushed at: almost 3 years ago - Stars: 6 - Forks: 1

Zhen-Dong/HAWQ
Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.
Language: Python - Size: 691 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 361 - Forks: 80

ShaoKAi100812/CudaCore_TensorCore_Acceleration
Compare the different runtime of CNN computation on CPU and GPU
Language: C++ - Size: 6.76 MB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 2 - Forks: 0

YukeWang96/APNN-TC_SC21 Fork of BoyuanFeng/APNN-TC
Artifact for SC21: APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores.
Size: 67.6 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

eshibusawa/Simple-Examples
simple examples of tools and libraries
Language: Python - Size: 83 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

YukeWang96/QGTC_PPoPP22
Artifact for PPoPP22 QGTC: Accelerating Quantized GNN via GPU Tensor Core.
Language: Python - Size: 28.4 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 20 - Forks: 2

hinofafa/torch_accelerator
Experiments to accelerate GPU device for PyTorch training
Language: Jupyter Notebook - Size: 27.3 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0
