GitHub topics: inference-server
containers/podman-desktop-extension-ai-lab
Work with LLMs on a local environment using containers
Language: TypeScript - Size: 14.8 MB - Last synced at: about 8 hours ago - Pushed at: about 10 hours ago - Stars: 239 - Forks: 63

basetenlabs/truss
The simplest way to serve AI/ML models in production
Language: Python - Size: 17.6 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1,026 - Forks: 87

roboflow/inference
Turn any computer or edge device into a command center for your computer vision projects.
Language: Python - Size: 130 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1,776 - Forks: 191

containers/ramalama
RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.
Language: Python - Size: 3.3 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1,890 - Forks: 213

pipeless-ai/pipeless
An open-source computer vision framework to build and deploy apps in minutes
Language: Rust - Size: 142 MB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 759 - Forks: 39

curtisgray/wingman
Wingman is the fastest and easiest way to run Llama models on your PC or Mac.
Language: TypeScript - Size: 188 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 42 - Forks: 2

friendliai/friendli-client 📦
[⛔️ DEPRECATED] Friendli: the fastest serving engine for generative AI
Language: Python - Size: 4.88 MB - Last synced at: 5 days ago - Pushed at: 22 days ago - Stars: 48 - Forks: 7

underneathall/pinferencia
Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
Language: Python - Size: 9.57 MB - Last synced at: 24 days ago - Pushed at: over 2 years ago - Stars: 555 - Forks: 85

kf5i/k3ai 📦
K3ai is a lightweight, fully automated, AI infrastructure-in-a-box solution that allows anyone to experiment quickly with Kubeflow pipelines. K3ai is perfect for anything from Edge to laptops.
Language: PowerShell - Size: 19.4 MB - Last synced at: 5 days ago - Pushed at: over 3 years ago - Stars: 101 - Forks: 10

pandruszkow/whisper-inference-server
A networked inference server for Whisper speech recognition
Language: Python - Size: 6.84 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 8 - Forks: 0

geniusrise/vision
Vision and vision-multi-modal components for geniusrise framework
Language: Python - Size: 33 MB - Last synced at: 26 days ago - Pushed at: 9 months ago - Stars: 7 - Forks: 1

notAI-tech/fastDeploy
Deploy DL/ ML inference pipelines with minimal extra code.
Language: Python - Size: 15.7 MB - Last synced at: 22 days ago - Pushed at: 8 months ago - Stars: 98 - Forks: 17

kibae/onnxruntime-server
ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.
Language: C++ - Size: 954 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 156 - Forks: 11

NGLSG/UniAPI
The Universal LLM Gateway - Integrate ANY AI Model with One Consistent API
Language: C++ - Size: 172 KB - Last synced at: 29 days ago - Pushed at: 2 months ago - Stars: 9 - Forks: 1

vertexclique/orkhon
Orkhon: ML Inference Framework and Server Runtime
Language: Rust - Size: 26.2 MB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 149 - Forks: 4

RubixML/Server
A standalone inference server for trained Rubix ML estimators.
Language: PHP - Size: 18.3 MB - Last synced at: 20 days ago - Pushed at: 4 months ago - Stars: 62 - Forks: 11

roboflow/inference-dashboard-example
Roboflow's inference server to analyze video streams. This project extracts insights from video frames at defined intervals and generates informative visualizations and CSV outputs.
Language: Python - Size: 97.7 KB - Last synced at: 7 days ago - Pushed at: about 1 year ago - Stars: 14 - Forks: 2

BMW-InnovationLab/BMW-TensorFlow-Inference-API-CPU
This is a repository for an object detection inference API using the Tensorflow framework.
Language: Python - Size: 9.96 MB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 183 - Forks: 47

autodeployai/ai-serving
Serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints
Language: Scala - Size: 285 KB - Last synced at: 4 months ago - Pushed at: 9 months ago - Stars: 156 - Forks: 31

NVIDIA/gpu-rest-engine 📦
A REST API for Caffe using Docker and Go
Language: C++ - Size: 255 KB - Last synced at: 6 days ago - Pushed at: almost 7 years ago - Stars: 419 - Forks: 93

niqbal996/triton_client
This repository serves as a client to send sensor messages from ROS or other sources to the Inference server and processes the inference results.
Language: Python - Size: 37.3 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

tensorchord/inference-benchmark
Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)
Language: Python - Size: 46.9 KB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 28 - Forks: 3

BMW-InnovationLab/BMW-YOLOv4-Inference-API-GPU
This is a repository for an nocode object detection inference API using the Yolov3 and Yolov4 Darknet framework.
Language: Python - Size: 20.5 MB - Last synced at: 15 days ago - Pushed at: about 3 years ago - Stars: 280 - Forks: 68

BMW-InnovationLab/BMW-YOLOv4-Inference-API-CPU
This is a repository for an nocode object detection inference API using the Yolov4 and Yolov3 Opencv.
Language: Python - Size: 21.4 MB - Last synced at: 15 days ago - Pushed at: about 3 years ago - Stars: 220 - Forks: 58

geniusrise/text
Text components powering LLMs & SLMs for geniusrise framework
Language: Python - Size: 15.6 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 5 - Forks: 2

dlzou/computron
Serving distributed deep learning models with model parallel swapping.
Language: Jupyter Notebook - Size: 2.1 MB - Last synced at: 17 days ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 1

redis-applied-ai/loan-prediction-microservice
An example of using Redis + RedisAI for a microservice that predicts consumer loan probabilities using Redis as a feature and model store and RedisAI as an inference server.
Language: Jupyter Notebook - Size: 24.6 MB - Last synced at: 8 months ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 0

geniusrise/audio
Audio components for geniusrise framework
Language: Python - Size: 70.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 1

xdevfaheem/TGS
Effortlessly Deploy and Serve Large Language Models in the Cloud as an API Endpoint for Inference
Language: Python - Size: 26.4 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

StefanoLusardi/tiny_inference_engine
Client/Server system to perform distributed inference on high load systems.
Language: C++ - Size: 11 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 1

zhangjun/TensorRT-Server
TensorRT Server
Language: C++ - Size: 68.4 KB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 1

leimao/Simple-Inference-Server
Inference Server Implementation from Scratch for Machine Learning Models
Language: Python - Size: 80.1 KB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 24 - Forks: 1

haicheviet/fullstack-machine-learning-inference
Fullstack machine learning inference template
Language: Jupyter Notebook - Size: 77.1 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 24 - Forks: 11

nikhiltadikonda/Kushagra-AI Fork of bshreddy/Kushagra-AI
An AI-powered mobile crop advisory app for farmers, gardeners that can provide information about crops using an image taken by the user. This supports 10 crops and 37 kinds of crop diseases. The AI model is a ResNet network that has been fine-tuned using crop images that were collected by web-scraping from Google Images and Plant-Village Dataset.
Size: 11.7 KB - Last synced at: 3 months ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

tensorchord/modelz-docs
Modelz is a developer-first platform for prototyping and deploying machine learning models.
Language: MDX - Size: 11.5 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 5

SABER-labs/torch_batcher
Serve pytorch inference requests using batching with redis for faster performance.
Language: Python - Size: 13.7 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 0

k9ele7en/Triton-TensorRT-Inference-CRAFT-pytorch
Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX
Language: Python - Size: 15.5 MB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 25 - Forks: 6

csy1204/TripBigs_Web
Session Based Real-time Hotel Recommendation Web Application
Language: Python - Size: 3.15 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 8 - Forks: 4

nikhiltadikonda/Kushagra Fork of bshreddy/Kushagra
Bundle of Repositories that power up all the Crop Prediction Applications
Size: 2.93 KB - Last synced at: 3 months ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

ajinkyapuar/qis
Language: Python - Size: 7.81 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

liusy182/sagemaker-run-your-own-inference
Run your own production inference code with Sagemaker
Language: Python - Size: 14.1 MB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

zhangjun/infer_server
Language: C++ - Size: 21.5 KB - Last synced at: about 1 month ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0
