Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: triton-inference-server

triton-inference-server/onnxruntime_backend

The Triton backend for the ONNX Runtime.

Language: C++ - Size: 253 KB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 114 - Forks: 53

torchpipe/torchpipe

An Alternative for Triton Inference Server. Boosting DL Service Throughput 1.5-4x by Ensemble Pipeline Serving with Concurrent CUDA Streams for PyTorch/LibTorch Frontend and TensorRT/CVCUDA, etc., Backends

Language: C++ - Size: 39.4 MB - Last synced: about 21 hours ago - Pushed: about 21 hours ago - Stars: 127 - Forks: 12

YeonwooSung/MLOps

Miscellaneous codes and writings for MLOps

Language: Jupyter Notebook - Size: 455 MB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 4 - Forks: 0

NVIDIA/GenerativeAIExamples

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

Language: Python - Size: 22.1 MB - Last synced: 7 days ago - Pushed: 12 days ago - Stars: 1,586 - Forks: 252

Zerohertz/triton-inference-server

Triton Inference Server Template

Language: Python - Size: 5.86 KB - Last synced: 7 days ago - Pushed: 8 days ago - Stars: 1 - Forks: 0

allegroai/clearml-serving

ClearML - Model-Serving Orchestration and Repository Solution

Language: Python - Size: 1.91 MB - Last synced: 6 days ago - Pushed: 20 days ago - Stars: 126 - Forks: 40

ConnorSouthEngineering/MVision

This repository contains the content for a proof of concept implementation of computer vision systems in industry. The project explores scalability and performance using the NVIDIA ecosystem, aiming to create an example scaffold for implementing a system accessible to non-technical users.

Language: TypeScript - Size: 13.6 MB - Last synced: 20 days ago - Pushed: 21 days ago - Stars: 0 - Forks: 0

fversaci/cassandra-dali-plugin

Cassandra plugin for NVIDIA DALI

Language: C++ - Size: 506 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 2

rtzr/tritony

Tiny configuration for Triton Inference Server

Language: Python - Size: 70.3 KB - Last synced: 7 days ago - Pushed: 5 months ago - Stars: 38 - Forks: 1

npuichigo/openai_trtllm

OpenAI compatible API for TensorRT LLM triton backend

Language: Rust - Size: 1.34 MB - Last synced: 26 days ago - Pushed: 27 days ago - Stars: 78 - Forks: 16

notAI-tech/fastDeploy

Deploy DL/ ML inference pipelines with minimal extra code.

Language: Python - Size: 15.7 MB - Last synced: about 16 hours ago - Pushed: 30 days ago - Stars: 93 - Forks: 18

NVIDIA-ISAAC-ROS/isaac_ros_dnn_inference

Hardware-accelerated DNN model inference ROS 2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU

Language: C++ - Size: 297 KB - Last synced: 27 days ago - Pushed: 6 months ago - Stars: 97 - Forks: 14

rungrodkspeed/resnet50_optimization

Language: Python - Size: 2.54 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0

RajeshThallam/fastertransformer-converter

This repository is a code sample to serve Large Language Models (LLM) on a Google Kubernetes Engine (GKE) cluster with GPUs running NVIDIA Triton Inference Server with FasterTransformer backend.

Language: Python - Size: 139 KB - Last synced: about 1 month ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

TunggTungg/image_retrieval

An image retrieval system that utilizes deep learning ResNet for feature extraction, Local Optimized Product Quantization techniques for storage and retrieval, and efficient deployment using Nvidia technologies like TensorRT and Triton Server, all accessible through a FastAPI-powered web API.

Language: Jupyter Notebook - Size: 1.23 GB - Last synced: about 1 month ago - Pushed: 2 months ago - Stars: 3 - Forks: 0

ybai789/yolov8-triton-tensorrt

Provides an ensemble model to deploy a YOLOv8 TensorRT model to Triton

Language: Python - Size: 20.1 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0

PD-Mera/triton-basic

An easy classification implement to explain how triton work

Language: Python - Size: 0 Bytes - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0

CoinCheung/BiSeNet

Add bisenetv2. My implementation of BiSeNet

Language: Python - Size: 3.59 MB - Last synced: about 2 months ago - Pushed: over 1 year ago - Stars: 1,345 - Forks: 299

TunggTungg/Celebrity-Look-Alike

An innovative project designed to provide users with an entertaining and engaging experience by comparing their facial features to those of celebrities.

Language: Python - Size: 149 MB - Last synced: about 1 month ago - Pushed: 2 months ago - Stars: 1 - Forks: 0

howsmyanimeprofilepicture/trt-diffusion-tutorial-kr

TensorRT를 통한 Stable Diffusion 가속하기

Language: Jupyter Notebook - Size: 3.02 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0

olibartfast/computer-vision-triton-cpp-client

C++ application to perform computer vision tasks using Nvidia Triton Server for model inference

Language: C++ - Size: 1.44 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 8 - Forks: 1

trinhtuanvubk/Diff-VC

Diffusion Model for Voice Conversion

Language: Jupyter Notebook - Size: 35.4 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 18 - Forks: 5

haritsahm/cpp-ml-server

Web Services for Machine Learning in C++

Language: C++ - Size: 188 KB - Last synced: 9 days ago - Pushed: about 1 year ago - Stars: 2 - Forks: 1

chiehpower/Setup-deeplearning-tools

Set up CI in DL/ cuda/ cudnn/ TensorRT/ onnx2trt/ onnxruntime/ onnxsim/ Pytorch/ Triton-Inference-Server/ Bazel/ Tesseract/ PaddleOCR/ NVIDIA-docker/ minIO/ Supervisord on AGX or PC from scratch.

Language: Python - Size: 4.7 MB - Last synced: about 1 month ago - Pushed: 8 months ago - Stars: 44 - Forks: 6

levipereira/deepstream-yolo-triton-server-rtsp-out

The Purpose of this repository is to create a DeepStream/Triton-Server sample application that utilizes yolov7, yolov7-qat, yolov9 models to perform inference on video files or RTSP streams.

Language: Python - Size: 39.1 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0

vadimkantorov/tritoninfererenceserverstringprocprimer

Example string processing pipeline on Triton Inference Server

Language: Python - Size: 64.5 KB - Last synced: about 1 month ago - Pushed: 5 months ago - Stars: 1 - Forks: 0

Zerohertz/YOLO-Serving-Cookbook

📸 YOLO Serving Cookbook based on Triton Inference Server 📸

Language: Python - Size: 1.49 MB - Last synced: 7 days ago - Pushed: 8 days ago - Stars: 3 - Forks: 1

yas-sim/openvino-model-server-wrapper

Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code.

Language: Python - Size: 24.5 MB - Last synced: about 1 month ago - Pushed: over 2 years ago - Stars: 8 - Forks: 1

isarsoft/yolov4-triton-tensorrt

This repository deploys YOLOv4 as an optimized TensorRT engine to Triton Inference Server

Language: C++ - Size: 629 KB - Last synced: 3 months ago - Pushed: almost 2 years ago - Stars: 271 - Forks: 62

vectornguyen76/search-engine-system

Search Engine on Shopee apply Image Search, Full-text Search, Auto-complete

Language: Jupyter Notebook - Size: 105 MB - Last synced: 24 days ago - Pushed: 4 months ago - Stars: 1 - Forks: 0

bug-developer021/YOLOV5_optimization_on_triton

Compare multiple optimization methods on triton to imporve model service performance

Language: Jupyter Notebook - Size: 2.13 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 41 - Forks: 9

Eilliw/trash-classification-public

Custom Yolov8x-cls edge model deployment and training to classify trash vs recycling.

Language: Python - Size: 234 MB - Last synced: 3 days ago - Pushed: 4 months ago - Stars: 1 - Forks: 0

duydvu/triton-inference-server-web-ui

Triton Inference Server Web UI

Language: TypeScript - Size: 210 KB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 0 - Forks: 0

kamalkraj/stable-diffusion-tritonserver

Deploy stable diffusion model with onnx/tenorrt + tritonserver

Language: Jupyter Notebook - Size: 2.62 MB - Last synced: 7 months ago - Pushed: 9 months ago - Stars: 91 - Forks: 21

Bobo-y/triton_ensemble_model_demo

triton server ensemble model demo

Language: Python - Size: 109 KB - Last synced: 7 months ago - Pushed: about 2 years ago - Stars: 28 - Forks: 8

omarabid59/yolov8-triton

Provides an ensemble model to deploy a YoloV8 ONNX model to Triton

Language: Python - Size: 11.7 KB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 11 - Forks: 4

viamrobotics/viam-mlmodelservice-triton

MLModelService wrapping Nvidia's Triton Server

Language: C++ - Size: 60.5 KB - Last synced: about 1 month ago - Pushed: about 2 months ago - Stars: 4 - Forks: 3

dev6699/yolotriton

Go gRPC client for YOLO-NAS, YOLOv8 inference using the Triton Inference Server.

Language: Go - Size: 13.6 MB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 2 - Forks: 0

akiragy/recsys_pipeline

Build Recommender System with PyTorch + Redis + Elasticsearch + Feast + Triton + Flask. Vector Recall, DeepFM Ranking and Web Application.

Language: Python - Size: 26 MB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 16 - Forks: 3

dudeperf3ct/end-to-end-images

This repo contains code for training and deploying PyTorch models with applications in images in end-to-end fashion.

Language: Jupyter Notebook - Size: 78.8 MB - Last synced: 9 months ago - Pushed: over 2 years ago - Stars: 2 - Forks: 0

Achiwilms/NVIDIA-Triton-Deployment-Quickstart

QuickStart Guide for Deploying a Basic ResNet Model on the Triton Inference Server

Language: Python - Size: 5.86 KB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 0 - Forks: 0

swapkh91/detectron2-to-tensorrt

Notebook with commands to convert a Detectron2 MaskRCNN model to TensorRT

Language: Jupyter Notebook - Size: 3.91 KB - Last synced: 10 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

RedisVentures/redis-feast-gcp

A demo of Redis Enterprise as the Online Feature Store deployed on GCP with Feast and NVIDIA Triton Inference Server.

Language: Jupyter Notebook - Size: 5.38 MB - Last synced: 10 months ago - Pushed: about 1 year ago - Stars: 10 - Forks: 2

eitansela/sagemaker-mme-gpu-triton-java-client

Run Multiple Models on the Same GPU with Amazon SageMaker Multi-Model Endpoints Powered by NVIDIA Triton Inference Server. A Java client is also provided.

Language: Java - Size: 376 KB - Last synced: 10 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

trinhtuanvubk/Wav2Vec2-Triton-Serving

Serve Wav2Vec2 model using Triton Inference Server

Language: Python - Size: 623 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 0

dpressel/reserve

FastAPI + WebSockets + SSE service to interface with Triton/Riva ASR

Language: Python - Size: 11.7 KB - Last synced: 9 months ago - Pushed: almost 2 years ago - Stars: 7 - Forks: 1

Team-BoonMoSa/Amazon-EC2-Inf1 📦

Serving YOLOv5 Segmentation Model with Amazon EC2 Inf1

Language: Python - Size: 46.9 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 0

tamanna18/Triton-Inference-Server-Deployment-with-ONNX-Models

Triton Inference Server Deployment with ONNX Models

Size: 9.77 KB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0

cmpark0126/trytune-py

Heterogeneous System ML Pipeline Scheduling Framework with Triton Inference Server as Backend

Language: Python - Size: 909 KB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0

rushai-dev/triton-server-ensemble-sidecar

Triton backend is difficult for a client to use whether it's sending by rest-api or grpc. If the client wants to customize the request body then this repository would like to offer a sidecar along with rest-api and triton client on Kubernetes.

Language: Python - Size: 23.4 KB - Last synced: 11 months ago - Pushed: about 1 year ago - Stars: 1 - Forks: 0

suryanshgupta9933/Scene-Script

An image to text model/pipeline using VIT and Transformers and deployment using Nvidia's Pytrition and Streamlit app.

Language: Python - Size: 3.25 MB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

rushai-dev/licence-plate-triton-server-ensemble

Triton backend that enables pre-processing, post-processing and other logic to be implemented in Python. In the repository, I use tech stack including YOLOv8, ONNX, EasyOCR, Triton Inference Server, CV2, Minio, Docker, and K8S. All of which we deploy on k80 and use CUDA 11.4

Language: Python - Size: 41 KB - Last synced: 11 months ago - Pushed: about 1 year ago - Stars: 1 - Forks: 0

smarter-project/armnn_tflite_backend

TensorFlow Lite backend with ArmNN delegate support for Nvidia Triton

Language: C++ - Size: 15.3 MB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 5 - Forks: 2

octoml/TransparentAI

An example of building your own ML cloud app using OctoML.

Language: Python - Size: 9.82 MB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 13 - Forks: 0

tonhathuy/tensorrt-triton-magface

Magface Triton Inferece Server Using Tensorrt

Language: Jupyter Notebook - Size: 722 KB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 10 - Forks: 2

tuxedocat/triton-client-polyglot-example

Example of generating triton-inference-server clients for some programming languages

Language: TypeScript - Size: 471 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 1 - Forks: 2

isarsoft/csrnet-triton-tensorrt

This repository deploys CSRNet as an optimized TensorRT engine to Triton Inference Server

Language: C++ - Size: 16.6 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 5 - Forks: 0

detail-novelist/novelist-triton-server

Deploy KoGPT with Triton Inference Server

Language: Shell - Size: 7.81 KB - Last synced: 12 months ago - Pushed: over 1 year ago - Stars: 13 - Forks: 0

Lapland-UAS-Tequ/tequ-setup-triton-inference-server

Configure NVIDIA Triton Inference Server on different platforms. Deploy object detection model in Tensorflow SavedModel format to server. Send images to server for inference with Node-RED. Triton Inference Server HTTP API is used for inference.

Size: 647 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 3 - Forks: 1

k9ele7en/Triton-TensorRT-Inference-CRAFT-pytorch

Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX

Language: Python - Size: 15.5 MB - Last synced: about 1 year ago - Pushed: almost 3 years ago - Stars: 25 - Forks: 6

Alek-dr/FastAPI-TrironServer-example

Language: Python - Size: 96.7 KB - Last synced: 19 days ago - Pushed: over 1 year ago - Stars: 2 - Forks: 0

gianpd/triton-inference

Language: Python - Size: 12.8 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

Biano-AI/serving-compare-middleware

FastAPI middleware for comparing different ML model serving approaches

Language: Python - Size: 268 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 12 - Forks: 0

niyazed/triton-mnist-example

MNIST inference example on NVIDIA Triton Inference Server

Language: PureBasic - Size: 566 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 1 - Forks: 1

oneonlee/Building-Transformer-Based-NLP-Applications

NVIDIA DLI "트랜스포머 기반 자연어 처리 애플리케이션 구축" 워크숍 레포지토리

Language: Jupyter Notebook - Size: 24.6 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 1

LeslieZhoa/Triton-Torch-Custom

Triton-Pytorch Custom operator tutorial

Language: Python - Size: 457 KB - Last synced: about 1 year ago - Pushed: about 2 years ago - Stars: 3 - Forks: 0

neuro-inc/mlops-pytorch-mlflow-triton

Example of deployment Pytorch model into the Triton inference server via MLFlow model registry

Language: Jupyter Notebook - Size: 24.4 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 1 - Forks: 2

octoml/ariel

A library for interfacing with Triton.

Language: Python - Size: 207 KB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0