GitHub topics: cutlass
leimao/CUTLASS-Examples
CUTLASS and CuTe Examples
Language: Cuda - Size: 429 KB - Last synced at: about 12 hours ago - Pushed at: 4 months ago - Stars: 49 - Forks: 7

coderonion/awesome-cuda-and-hpc
🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.
Size: 55.7 KB - Last synced at: 1 day ago - Pushed at: 12 days ago - Stars: 258 - Forks: 30

sgl-project/whl
Kernel Library Wheel for SGLang
Language: HTML - Size: 26.4 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 9 - Forks: 1

peterlau123/Lolly
Lightweight and production level C++ Open source Library
Language: C++ - Size: 102 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

prateekshukla1108/cutlass3
Docs
Language: HTML - Size: 22.5 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 0 - Forks: 0

bytedance/flux
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
Language: C++ - Size: 2.6 MB - Last synced at: 29 days ago - Pushed at: about 1 month ago - Stars: 853 - Forks: 55

cjmcv/ai-infra-notes
Reading notes on the open source code of AI infrastructure (sglang, llm, cutlass, hpc, etc.)
Size: 777 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

Bruce-Lee-LY/cutlass_gemm
Multiple GEMM operators are constructed with cutlass to support LLM inference.
Language: C++ - Size: 2.14 MB - Last synced at: 24 days ago - Pushed at: 7 months ago - Stars: 17 - Forks: 2

Bruce-Lee-LY/flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
Language: C++ - Size: 1.99 MB - Last synced at: 24 days ago - Pushed at: 2 months ago - Stars: 35 - Forks: 4

DD-DuDa/Cute-Learning
Examples of CUDA implementations by Cutlass CuTe
Language: Makefile - Size: 21.2 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 132 - Forks: 15

YashasSamaga/ConvolutionBuildingBlocks
GEMM and Winograd based convolutions using CUTLASS
Language: Cuda - Size: 218 KB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 26 - Forks: 3

yester31/Cutlass_EX
study of cutlass
Language: Cuda - Size: 66.4 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 19 - Forks: 4

digital-nomad-cheng/tvm_project_course
Language: Python - Size: 3.11 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

Routhleck/blocksparse-pytorch-implement
pytorch implements block sparse
Language: C++ - Size: 1.41 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0
