ecosyste.ms

Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: cuda-core

Repositories

Bruce-Lee-LY/decoding_attention

Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.

Language: C++ - Size: 884 KB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 36 - Forks: 3

Bruce-Lee-LY/cuda_hgemv

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

Language: Cuda - Size: 459 KB - Last synced at: 2 days ago - Pushed at: 8 months ago - Stars: 61 - Forks: 5

Related Keywords

cuda 2 cuda-core 2 gpu 2 nvidia 2 tensor-core 1 matrix-multiply 1 hgemv 1 hgemm 1 gemv 1 gemm 1 cublas 1 multi-head-attention 1 mqa 1 mla 1 mha 1 llm 1 large-language-model 1 inference 1 gqa 1 flashmla 1 flashinfer 1 flash-attention 1 decoding-attention 1