An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: tensor-core

Bruce-Lee-LY/cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Language: Cuda - Size: 1.1 MB - Last synced at: 1 day ago - Pushed at: 8 months ago - Stars: 408 - Forks: 79

Bruce-Lee-LY/cuda_hgemv

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

Language: Cuda - Size: 459 KB - Last synced at: 1 day ago - Pushed at: 8 months ago - Stars: 61 - Forks: 5

Bruce-Lee-LY/cutlass_gemm

Multiple GEMM operators are constructed with cutlass to support LLM inference.

Language: C++ - Size: 2.14 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 17 - Forks: 2

Bruce-Lee-LY/flash_attention_inference

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

Language: C++ - Size: 1.99 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 35 - Forks: 4

Bruce-Lee-LY/cuda_back2back_hgemm

Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.

Language: Cuda - Size: 854 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 11 - Forks: 2

fan1997/DTC-SpMM-ASPLOS24

Codes for DTC-SpMM (ASPLOS'24)

Language: C++ - Size: 1.23 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

junqi-xie-learning/CS4302-Assignments 📦

The lab assignments from CS4302 Parallel and Distributed Programming (2022 Fall) with my solutions

Language: C++ - Size: 4.86 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0