An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: fastertransformer

InternLM/lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Language: Python - Size: 8.18 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 6,504 - Forks: 556

Curt-Park/serving-codegen-gptj-triton

Serving Example of CodeGen-350M-Mono-GPTJ on Triton Inference Server with Docker and Kubernetes

Language: Python - Size: 5.47 MB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 20 - Forks: 0

RajeshThallam/fastertransformer-converter

This repository is a code sample to serve Large Language Models (LLM) on a Google Kubernetes Engine (GKE) cluster with GPUs running NVIDIA Triton Inference Server with FasterTransformer backend.

Language: Python - Size: 139 KB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

detail-novelist/novelist-triton-server

Deploy KoGPT with Triton Inference Server

Language: Shell - Size: 7.81 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 0

clam004/triton-ft-api

tutorial on how to deploy a scalable autoregressive causal language model transformer using nvidia triton server

Language: Python - Size: 52.7 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0