An open API service providing repository metadata for many open source software ecosystems.

Topic: "tensorrt-llm"

xlite-dev/Awesome-LLM-Inference

📚A curated list of Awesome LLM Inference Papers with Codes.

Language: Python - Size: 115 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 4,123 - Forks: 287

collabora/WhisperLive

A nearly-live implementation of OpenAI's Whisper.

Language: Python - Size: 6.13 MB - Last synced at: 28 days ago - Pushed at: about 1 month ago - Stars: 2,896 - Forks: 384

shashikg/WhisperS2T

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

Language: Jupyter Notebook - Size: 1.16 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 416 - Forks: 54

huggingface/optimum-benchmark

🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.

Language: Python - Size: 8.3 MB - Last synced at: 3 days ago - Pushed at: 28 days ago - Stars: 304 - Forks: 58

coderonion/awesome-cuda-and-hpc

🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.

Size: 56.6 KB - Last synced at: 4 days ago - Pushed at: 25 days ago - Stars: 285 - Forks: 31

npuichigo/openai_trtllm

OpenAI compatible API for TensorRT LLM triton backend

Language: Rust - Size: 1.35 MB - Last synced at: 5 days ago - Pushed at: 11 months ago - Stars: 209 - Forks: 28

NetEase-Media/grps

Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offering scalability, extensibility, and high performance. It helps users quickly deploy models and provide services through HTTP/RPC interfaces.

Language: C++ - Size: 67.8 MB - Last synced at: 24 days ago - Pushed at: about 2 months ago - Stars: 164 - Forks: 13

NetEase-Media/grps_trtllm

Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, distributed multi-GPU inference, multimodal capabilities, and a Gradio chat interface.

Language: Python - Size: 135 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 135 - Forks: 11

openhackathons-org/End-to-End-LLM

This repository is an AI Bootcamp material that consist of a workflow for LLM

Language: Jupyter Notebook - Size: 24.3 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 84 - Forks: 36

vossr/Chat-With-RTX-python-api

Chat With RTX Python API

Language: Python - Size: 12.7 KB - Last synced at: about 21 hours ago - Pushed at: about 1 month ago - Stars: 65 - Forks: 11

guidance-ai/llgtrt

TensorRT-LLM server with Structured Outputs (JSON) built with Rust

Language: Rust - Size: 181 KB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 52 - Forks: 10

argonne-lcf/LLM-Inference-Bench

LLM-Inference-Bench

Language: Jupyter Notebook - Size: 11.2 MB - Last synced at: about 2 hours ago - Pushed at: 15 days ago - Stars: 45 - Forks: 4

menloresearch/cortex.tensorrt-llm Fork of NVIDIA/TensorRT-LLM

Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.

Language: C++ - Size: 273 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 43 - Forks: 2

fgblanch/OutlookLLM

Add-in for new Outlook that adds LLM new features (Composition, Summarizing, Q&A). It uses a local LLM via Nvidia TensorRT-LLM

Language: Python - Size: 2.15 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 39 - Forks: 2

modal-labs/stopwatch

A tool for benchmarking LLMs on Modal

Language: Python - Size: 2.05 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 28 - Forks: 4

zRzRzRzRzRzRzR/lm-fly

大模型推理框架加速,让 LLM 飞起来

Language: Python - Size: 7.45 MB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 18 - Forks: 4

lix19937/llm-deploy

AI Infra LLM infer/ tensorrt-llm/ vllm

Language: Python - Size: 5.51 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 11 - Forks: 0

Delxrius/MiniMax-01

MiniMax-01 is a simple implementation of the MiniMax algorithm, a widely used strategy for decision-making in two-player turn-based games like Tic-Tac-Toe. The algorithm aims to minimize the maximum possible loss for the player, making it a popular choice for developing AI opponents in various game scenarios.

Size: 1000 Bytes - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 4 - Forks: 0

wcks13589/LLM-Tutorial

LLM tutorial materials include but not limited to NVIDIA NeMo, TensorRT-LLM, Triton Inference Server, and NeMo Guardrails.

Language: Python - Size: 3.11 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 3 - Forks: 1

MustaphaU/Simplify-Documentation-Review-on-Atlassian-Confluence-with-LLAMA2-and-NVIDIA-TensorRT-LLM

A simple project demonstrating LLM assisted review of documentation on Atlasssian Confluence.

Language: Python - Size: 927 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

yui-mhcp/language_models

A Large Language Models (LLM) oriented project providing easy-to-use features like RAG, translation, summarization, ...

Language: Python - Size: 2.25 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Rahman2001/nim-factory

This project is a factory for NVIDIA NIM containers in which users/businesses can quantize many models and build their own TensorRT-LLM engine for optimized inference.

Language: Jupyter Notebook - Size: 279 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

ccyrene/flash_whisper

Whisper optimization for real-time application

Language: Python - Size: 2.04 MB - Last synced at: 5 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

YconquestY/cc

Summary of call graphs and data structures of collective communication plugin in NVIDIA TensorRT-LLM

Language: D2 - Size: 1.95 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

nyunAI/TensorRT-LLM

Language: C++ - Size: 210 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

CactusQ/TensorRT-LLM-Tutorial

Getting started with TensorRT-LLM using BLOOM as a case study

Language: Jupyter Notebook - Size: 85 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

cyanff/nyxt

Language: TypeScript - Size: 517 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0