An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: tensorcore

enp1s0/ozIMMU

FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme

Language: Cuda - Size: 186 KB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 70 - Forks: 5

enp1s0/cuMpSGEMM

Fast SGEMM emulation on Tensor Cores

Language: Cuda - Size: 476 KB - Last synced at: 7 days ago - Pushed at: 4 months ago - Stars: 12 - Forks: 1

wmmae/wmma_extension

An extension library of WMMA API (Tensor Core API)

Language: Cuda - Size: 698 KB - Last synced at: 7 days ago - Pushed at: 11 months ago - Stars: 97 - Forks: 15

NikhilRout/TheTensorCoreProject

Microarchitecture implementation of Nvidia's Tensor Cores

Language: Verilog - Size: 17.9 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

robbwu/tensorsvm

Fast Kernel SVM on TensorCore enabled GPU

Language: Cuda - Size: 140 KB - Last synced at: 5 months ago - Pushed at: almost 3 years ago - Stars: 6 - Forks: 1

Zhen-Dong/HAWQ

Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.

Language: Python - Size: 691 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 361 - Forks: 80

ShaoKAi100812/CudaCore_TensorCore_Acceleration

Compare the different runtime of CNN computation on CPU and GPU

Language: C++ - Size: 6.76 MB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 2 - Forks: 0

YukeWang96/APNN-TC_SC21 Fork of BoyuanFeng/APNN-TC

Artifact for SC21: APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores.

Size: 67.6 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

eshibusawa/Simple-Examples

simple examples of tools and libraries

Language: Python - Size: 83 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

YukeWang96/QGTC_PPoPP22

Artifact for PPoPP22 QGTC: Accelerating Quantized GNN via GPU Tensor Core.

Language: Python - Size: 28.4 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 20 - Forks: 2

hinofafa/torch_accelerator

Experiments to accelerate GPU device for PyTorch training

Language: Jupyter Notebook - Size: 27.3 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0