GitHub topics: cutlass

Repositories

leimao/CUTLASS-Examples

CUTLASS and CuTe Examples

Language: Cuda - Size: 429 KB - Last synced at: about 12 hours ago - Pushed at: 4 months ago - Stars: 49 - Forks: 7

coderonion/awesome-cuda-and-hpc

🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.

Size: 55.7 KB - Last synced at: 1 day ago - Pushed at: 12 days ago - Stars: 258 - Forks: 30

sgl-project/whl

Kernel Library Wheel for SGLang

Language: HTML - Size: 26.4 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 9 - Forks: 1

peterlau123/Lolly

Lightweight and production level C++ Open source Library

Language: C++ - Size: 102 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

prateekshukla1108/cutlass3

Docs

Language: HTML - Size: 22.5 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 0 - Forks: 0

bytedance/flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

Language: C++ - Size: 2.6 MB - Last synced at: 29 days ago - Pushed at: about 1 month ago - Stars: 853 - Forks: 55

cjmcv/ai-infra-notes

Reading notes on the open source code of AI infrastructure (sglang, llm, cutlass, hpc, etc.)

Size: 777 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

Bruce-Lee-LY/cutlass_gemm

Multiple GEMM operators are constructed with cutlass to support LLM inference.

Language: C++ - Size: 2.14 MB - Last synced at: 24 days ago - Pushed at: 7 months ago - Stars: 17 - Forks: 2

Bruce-Lee-LY/flash_attention_inference

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

Language: C++ - Size: 1.99 MB - Last synced at: 24 days ago - Pushed at: 2 months ago - Stars: 35 - Forks: 4

DD-DuDa/Cute-Learning

Examples of CUDA implementations by Cutlass CuTe

Language: Makefile - Size: 21.2 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 132 - Forks: 15

YashasSamaga/ConvolutionBuildingBlocks

GEMM and Winograd based convolutions using CUTLASS

Language: Cuda - Size: 218 KB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 26 - Forks: 3

yester31/Cutlass_EX

study of cutlass

Language: Cuda - Size: 66.4 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 19 - Forks: 4

digital-nomad-cheng/tvm_project_course

Language: Python - Size: 3.11 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

Routhleck/blocksparse-pytorch-implement

pytorch implements block sparse

Language: C++ - Size: 1.41 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

Related Keywords

cutlass 14 cuda 12 gpu 6 llm 4 pytorch 3 nvidia 2 tensor-core 2 inference 2 sglang 2 tvm 2 tensorrt 2 hpc 2 gemm 2 cublas 2 tilesparse 1 python 1 flash-attention 1 flash-attention-2 1 matrix-multiplication 1 large-language-model 1 mha 1 multi-head-attention 1 convolution 1 deep-learning 1 cmake 1 blocksparse 1 cpp17 1 linux-programming 1 neural-network 1 parallel-programming 1 compiler 1 dl-compiler 1 docker 1 awesome 1 blas 1 cudnn 1 deepseek 1 llama 1 mlir 1 ptx 1 tensorrt-llm 1 triton 1 vlm 1 cu118 1 flashinfer 1 c 1 cpp 1 heterogeneous-computing 1 mlsys 1 simd 1 cublaslt 1 matrix-multiply 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos