GitHub topics: fastertransformer
InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Language: Python - Size: 8.18 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 6,504 - Forks: 556

Curt-Park/serving-codegen-gptj-triton
Serving Example of CodeGen-350M-Mono-GPTJ on Triton Inference Server with Docker and Kubernetes
Language: Python - Size: 5.47 MB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 20 - Forks: 0

RajeshThallam/fastertransformer-converter
This repository is a code sample to serve Large Language Models (LLM) on a Google Kubernetes Engine (GKE) cluster with GPUs running NVIDIA Triton Inference Server with FasterTransformer backend.
Language: Python - Size: 139 KB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

detail-novelist/novelist-triton-server
Deploy KoGPT with Triton Inference Server
Language: Shell - Size: 7.81 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 0

clam004/triton-ft-api
tutorial on how to deploy a scalable autoregressive causal language model transformer using nvidia triton server
Language: Python - Size: 52.7 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0
