GitHub / menloresearch / cortex.tensorrt-llm
Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.
JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/menloresearch%2Fcortex.tensorrt-llm
Fork of NVIDIA/TensorRT-LLM
Stars: 43
Forks: 2
Open issues: 3
License: apache-2.0
Language: C++
Size: 273 MB
Dependencies parsed at: Pending
Created at: about 1 year ago
Updated at: about 2 months ago
Pushed at: 7 months ago
Last synced at: about 1 month ago
Topics: jan, llm, nvidia, tensorrt, tensorrt-llm