Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: triton-inference-server
triton-inference-server/onnxruntime_backend
The Triton backend for the ONNX Runtime.
Language: C++ - Size: 253 KB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 114 - Forks: 53
torchpipe/torchpipe
An Alternative for Triton Inference Server. Boosting DL Service Throughput 1.5-4x by Ensemble Pipeline Serving with Concurrent CUDA Streams for PyTorch/LibTorch Frontend and TensorRT/CVCUDA, etc., Backends
Language: C++ - Size: 39.4 MB - Last synced: about 21 hours ago - Pushed: about 21 hours ago - Stars: 127 - Forks: 12
YeonwooSung/MLOps
Miscellaneous codes and writings for MLOps
Language: Jupyter Notebook - Size: 455 MB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 4 - Forks: 0
NVIDIA/GenerativeAIExamples
Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
Language: Python - Size: 22.1 MB - Last synced: 7 days ago - Pushed: 12 days ago - Stars: 1,586 - Forks: 252
Zerohertz/triton-inference-server
Triton Inference Server Template
Language: Python - Size: 5.86 KB - Last synced: 7 days ago - Pushed: 8 days ago - Stars: 1 - Forks: 0
allegroai/clearml-serving
ClearML - Model-Serving Orchestration and Repository Solution
Language: Python - Size: 1.91 MB - Last synced: 6 days ago - Pushed: 20 days ago - Stars: 126 - Forks: 40
ConnorSouthEngineering/MVision
This repository contains the content for a proof of concept implementation of computer vision systems in industry. The project explores scalability and performance using the NVIDIA ecosystem, aiming to create an example scaffold for implementing a system accessible to non-technical users.
Language: TypeScript - Size: 13.6 MB - Last synced: 20 days ago - Pushed: 21 days ago - Stars: 0 - Forks: 0
fversaci/cassandra-dali-plugin
Cassandra plugin for NVIDIA DALI
Language: C++ - Size: 506 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 2
rtzr/tritony
Tiny configuration for Triton Inference Server
Language: Python - Size: 70.3 KB - Last synced: 7 days ago - Pushed: 5 months ago - Stars: 38 - Forks: 1
npuichigo/openai_trtllm
OpenAI compatible API for TensorRT LLM triton backend
Language: Rust - Size: 1.34 MB - Last synced: 26 days ago - Pushed: 27 days ago - Stars: 78 - Forks: 16
notAI-tech/fastDeploy
Deploy DL/ ML inference pipelines with minimal extra code.
Language: Python - Size: 15.7 MB - Last synced: about 16 hours ago - Pushed: 30 days ago - Stars: 93 - Forks: 18
NVIDIA-ISAAC-ROS/isaac_ros_dnn_inference
Hardware-accelerated DNN model inference ROS 2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU
Language: C++ - Size: 297 KB - Last synced: 27 days ago - Pushed: 6 months ago - Stars: 97 - Forks: 14
rungrodkspeed/resnet50_optimization
Language: Python - Size: 2.54 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0
RajeshThallam/fastertransformer-converter
This repository is a code sample to serve Large Language Models (LLM) on a Google Kubernetes Engine (GKE) cluster with GPUs running NVIDIA Triton Inference Server with FasterTransformer backend.
Language: Python - Size: 139 KB - Last synced: about 1 month ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0
TunggTungg/image_retrieval
An image retrieval system that utilizes deep learning ResNet for feature extraction, Local Optimized Product Quantization techniques for storage and retrieval, and efficient deployment using Nvidia technologies like TensorRT and Triton Server, all accessible through a FastAPI-powered web API.
Language: Jupyter Notebook - Size: 1.23 GB - Last synced: about 1 month ago - Pushed: 2 months ago - Stars: 3 - Forks: 0
ybai789/yolov8-triton-tensorrt
Provides an ensemble model to deploy a YOLOv8 TensorRT model to Triton
Language: Python - Size: 20.1 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0
PD-Mera/triton-basic
An easy classification implement to explain how triton work
Language: Python - Size: 0 Bytes - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0
CoinCheung/BiSeNet
Add bisenetv2. My implementation of BiSeNet
Language: Python - Size: 3.59 MB - Last synced: about 2 months ago - Pushed: over 1 year ago - Stars: 1,345 - Forks: 299
TunggTungg/Celebrity-Look-Alike
An innovative project designed to provide users with an entertaining and engaging experience by comparing their facial features to those of celebrities.
Language: Python - Size: 149 MB - Last synced: about 1 month ago - Pushed: 2 months ago - Stars: 1 - Forks: 0
howsmyanimeprofilepicture/trt-diffusion-tutorial-kr
TensorRT를 통한 Stable Diffusion 가속하기
Language: Jupyter Notebook - Size: 3.02 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0
olibartfast/computer-vision-triton-cpp-client
C++ application to perform computer vision tasks using Nvidia Triton Server for model inference
Language: C++ - Size: 1.44 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 8 - Forks: 1
trinhtuanvubk/Diff-VC
Diffusion Model for Voice Conversion
Language: Jupyter Notebook - Size: 35.4 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 18 - Forks: 5
haritsahm/cpp-ml-server
Web Services for Machine Learning in C++
Language: C++ - Size: 188 KB - Last synced: 9 days ago - Pushed: about 1 year ago - Stars: 2 - Forks: 1
chiehpower/Setup-deeplearning-tools
Set up CI in DL/ cuda/ cudnn/ TensorRT/ onnx2trt/ onnxruntime/ onnxsim/ Pytorch/ Triton-Inference-Server/ Bazel/ Tesseract/ PaddleOCR/ NVIDIA-docker/ minIO/ Supervisord on AGX or PC from scratch.
Language: Python - Size: 4.7 MB - Last synced: about 1 month ago - Pushed: 8 months ago - Stars: 44 - Forks: 6
levipereira/deepstream-yolo-triton-server-rtsp-out
The Purpose of this repository is to create a DeepStream/Triton-Server sample application that utilizes yolov7, yolov7-qat, yolov9 models to perform inference on video files or RTSP streams.
Language: Python - Size: 39.1 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0
vadimkantorov/tritoninfererenceserverstringprocprimer
Example string processing pipeline on Triton Inference Server
Language: Python - Size: 64.5 KB - Last synced: about 1 month ago - Pushed: 5 months ago - Stars: 1 - Forks: 0
Zerohertz/YOLO-Serving-Cookbook
📸 YOLO Serving Cookbook based on Triton Inference Server 📸
Language: Python - Size: 1.49 MB - Last synced: 7 days ago - Pushed: 8 days ago - Stars: 3 - Forks: 1
yas-sim/openvino-model-server-wrapper
Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code.
Language: Python - Size: 24.5 MB - Last synced: about 1 month ago - Pushed: over 2 years ago - Stars: 8 - Forks: 1
isarsoft/yolov4-triton-tensorrt
This repository deploys YOLOv4 as an optimized TensorRT engine to Triton Inference Server
Language: C++ - Size: 629 KB - Last synced: 3 months ago - Pushed: almost 2 years ago - Stars: 271 - Forks: 62
vectornguyen76/search-engine-system
Search Engine on Shopee apply Image Search, Full-text Search, Auto-complete
Language: Jupyter Notebook - Size: 105 MB - Last synced: 24 days ago - Pushed: 4 months ago - Stars: 1 - Forks: 0
bug-developer021/YOLOV5_optimization_on_triton
Compare multiple optimization methods on triton to imporve model service performance
Language: Jupyter Notebook - Size: 2.13 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 41 - Forks: 9
Eilliw/trash-classification-public
Custom Yolov8x-cls edge model deployment and training to classify trash vs recycling.
Language: Python - Size: 234 MB - Last synced: 3 days ago - Pushed: 4 months ago - Stars: 1 - Forks: 0
duydvu/triton-inference-server-web-ui
Triton Inference Server Web UI
Language: TypeScript - Size: 210 KB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 0 - Forks: 0
kamalkraj/stable-diffusion-tritonserver
Deploy stable diffusion model with onnx/tenorrt + tritonserver
Language: Jupyter Notebook - Size: 2.62 MB - Last synced: 7 months ago - Pushed: 9 months ago - Stars: 91 - Forks: 21
Bobo-y/triton_ensemble_model_demo
triton server ensemble model demo
Language: Python - Size: 109 KB - Last synced: 7 months ago - Pushed: about 2 years ago - Stars: 28 - Forks: 8
omarabid59/yolov8-triton
Provides an ensemble model to deploy a YoloV8 ONNX model to Triton
Language: Python - Size: 11.7 KB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 11 - Forks: 4
viamrobotics/viam-mlmodelservice-triton
MLModelService wrapping Nvidia's Triton Server
Language: C++ - Size: 60.5 KB - Last synced: about 1 month ago - Pushed: about 2 months ago - Stars: 4 - Forks: 3
dev6699/yolotriton
Go gRPC client for YOLO-NAS, YOLOv8 inference using the Triton Inference Server.
Language: Go - Size: 13.6 MB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 2 - Forks: 0
akiragy/recsys_pipeline
Build Recommender System with PyTorch + Redis + Elasticsearch + Feast + Triton + Flask. Vector Recall, DeepFM Ranking and Web Application.
Language: Python - Size: 26 MB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 16 - Forks: 3
dudeperf3ct/end-to-end-images
This repo contains code for training and deploying PyTorch models with applications in images in end-to-end fashion.
Language: Jupyter Notebook - Size: 78.8 MB - Last synced: 9 months ago - Pushed: over 2 years ago - Stars: 2 - Forks: 0
Achiwilms/NVIDIA-Triton-Deployment-Quickstart
QuickStart Guide for Deploying a Basic ResNet Model on the Triton Inference Server
Language: Python - Size: 5.86 KB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 0 - Forks: 0
swapkh91/detectron2-to-tensorrt
Notebook with commands to convert a Detectron2 MaskRCNN model to TensorRT
Language: Jupyter Notebook - Size: 3.91 KB - Last synced: 10 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
RedisVentures/redis-feast-gcp
A demo of Redis Enterprise as the Online Feature Store deployed on GCP with Feast and NVIDIA Triton Inference Server.
Language: Jupyter Notebook - Size: 5.38 MB - Last synced: 10 months ago - Pushed: about 1 year ago - Stars: 10 - Forks: 2
eitansela/sagemaker-mme-gpu-triton-java-client
Run Multiple Models on the Same GPU with Amazon SageMaker Multi-Model Endpoints Powered by NVIDIA Triton Inference Server. A Java client is also provided.
Language: Java - Size: 376 KB - Last synced: 10 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
trinhtuanvubk/Wav2Vec2-Triton-Serving
Serve Wav2Vec2 model using Triton Inference Server
Language: Python - Size: 623 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 0
dpressel/reserve
FastAPI + WebSockets + SSE service to interface with Triton/Riva ASR
Language: Python - Size: 11.7 KB - Last synced: 9 months ago - Pushed: almost 2 years ago - Stars: 7 - Forks: 1
Team-BoonMoSa/Amazon-EC2-Inf1 📦
Serving YOLOv5 Segmentation Model with Amazon EC2 Inf1
Language: Python - Size: 46.9 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 0
tamanna18/Triton-Inference-Server-Deployment-with-ONNX-Models
Triton Inference Server Deployment with ONNX Models
Size: 9.77 KB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0
cmpark0126/trytune-py
Heterogeneous System ML Pipeline Scheduling Framework with Triton Inference Server as Backend
Language: Python - Size: 909 KB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0
rushai-dev/triton-server-ensemble-sidecar
Triton backend is difficult for a client to use whether it's sending by rest-api or grpc. If the client wants to customize the request body then this repository would like to offer a sidecar along with rest-api and triton client on Kubernetes.
Language: Python - Size: 23.4 KB - Last synced: 11 months ago - Pushed: about 1 year ago - Stars: 1 - Forks: 0
suryanshgupta9933/Scene-Script
An image to text model/pipeline using VIT and Transformers and deployment using Nvidia's Pytrition and Streamlit app.
Language: Python - Size: 3.25 MB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0
rushai-dev/licence-plate-triton-server-ensemble
Triton backend that enables pre-processing, post-processing and other logic to be implemented in Python. In the repository, I use tech stack including YOLOv8, ONNX, EasyOCR, Triton Inference Server, CV2, Minio, Docker, and K8S. All of which we deploy on k80 and use CUDA 11.4
Language: Python - Size: 41 KB - Last synced: 11 months ago - Pushed: about 1 year ago - Stars: 1 - Forks: 0
smarter-project/armnn_tflite_backend
TensorFlow Lite backend with ArmNN delegate support for Nvidia Triton
Language: C++ - Size: 15.3 MB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 5 - Forks: 2
octoml/TransparentAI
An example of building your own ML cloud app using OctoML.
Language: Python - Size: 9.82 MB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 13 - Forks: 0
tonhathuy/tensorrt-triton-magface
Magface Triton Inferece Server Using Tensorrt
Language: Jupyter Notebook - Size: 722 KB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 10 - Forks: 2
tuxedocat/triton-client-polyglot-example
Example of generating triton-inference-server clients for some programming languages
Language: TypeScript - Size: 471 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 1 - Forks: 2
isarsoft/csrnet-triton-tensorrt
This repository deploys CSRNet as an optimized TensorRT engine to Triton Inference Server
Language: C++ - Size: 16.6 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 5 - Forks: 0
detail-novelist/novelist-triton-server
Deploy KoGPT with Triton Inference Server
Language: Shell - Size: 7.81 KB - Last synced: 12 months ago - Pushed: over 1 year ago - Stars: 13 - Forks: 0
Lapland-UAS-Tequ/tequ-setup-triton-inference-server
Configure NVIDIA Triton Inference Server on different platforms. Deploy object detection model in Tensorflow SavedModel format to server. Send images to server for inference with Node-RED. Triton Inference Server HTTP API is used for inference.
Size: 647 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 3 - Forks: 1
k9ele7en/Triton-TensorRT-Inference-CRAFT-pytorch
Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX
Language: Python - Size: 15.5 MB - Last synced: about 1 year ago - Pushed: almost 3 years ago - Stars: 25 - Forks: 6
Alek-dr/FastAPI-TrironServer-example
Language: Python - Size: 96.7 KB - Last synced: 19 days ago - Pushed: over 1 year ago - Stars: 2 - Forks: 0
gianpd/triton-inference
Language: Python - Size: 12.8 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
Biano-AI/serving-compare-middleware
FastAPI middleware for comparing different ML model serving approaches
Language: Python - Size: 268 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 12 - Forks: 0
niyazed/triton-mnist-example
MNIST inference example on NVIDIA Triton Inference Server
Language: PureBasic - Size: 566 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 1 - Forks: 1
oneonlee/Building-Transformer-Based-NLP-Applications
NVIDIA DLI "트랜스포머 기반 자연어 처리 애플리케이션 구축" 워크숍 레포지토리
Language: Jupyter Notebook - Size: 24.6 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 1
LeslieZhoa/Triton-Torch-Custom
Triton-Pytorch Custom operator tutorial
Language: Python - Size: 457 KB - Last synced: about 1 year ago - Pushed: about 2 years ago - Stars: 3 - Forks: 0
neuro-inc/mlops-pytorch-mlflow-triton
Example of deployment Pytorch model into the Triton inference server via MLFlow model registry
Language: Jupyter Notebook - Size: 24.4 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 1 - Forks: 2
octoml/ariel
A library for interfacing with Triton.
Language: Python - Size: 207 KB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0