GitHub topics: inference-server

Repositories

containers/podman-desktop-extension-ai-lab

Work with LLMs on a local environment using containers

Language: TypeScript - Size: 14.8 MB - Last synced at: about 8 hours ago - Pushed at: about 10 hours ago - Stars: 239 - Forks: 63

basetenlabs/truss

The simplest way to serve AI/ML models in production

Language: Python - Size: 17.6 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1,026 - Forks: 87

roboflow/inference

Turn any computer or edge device into a command center for your computer vision projects.

Language: Python - Size: 130 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1,776 - Forks: 191

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

Language: Python - Size: 3.3 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1,890 - Forks: 213

pipeless-ai/pipeless

An open-source computer vision framework to build and deploy apps in minutes

Language: Rust - Size: 142 MB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 759 - Forks: 39

curtisgray/wingman

Wingman is the fastest and easiest way to run Llama models on your PC or Mac.

Language: TypeScript - Size: 188 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 42 - Forks: 2

friendliai/friendli-client 📦

[⛔️ DEPRECATED] Friendli: the fastest serving engine for generative AI

Language: Python - Size: 4.88 MB - Last synced at: 5 days ago - Pushed at: 22 days ago - Stars: 48 - Forks: 7

underneathall/pinferencia

Python + Inference - Model Deployment library in Python. Simplest model inference server ever.

Language: Python - Size: 9.57 MB - Last synced at: 24 days ago - Pushed at: over 2 years ago - Stars: 555 - Forks: 85

kf5i/k3ai 📦

K3ai is a lightweight, fully automated, AI infrastructure-in-a-box solution that allows anyone to experiment quickly with Kubeflow pipelines. K3ai is perfect for anything from Edge to laptops.

Language: PowerShell - Size: 19.4 MB - Last synced at: 5 days ago - Pushed at: over 3 years ago - Stars: 101 - Forks: 10

pandruszkow/whisper-inference-server

A networked inference server for Whisper speech recognition

Language: Python - Size: 6.84 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 8 - Forks: 0

geniusrise/vision

Vision and vision-multi-modal components for geniusrise framework

Language: Python - Size: 33 MB - Last synced at: 26 days ago - Pushed at: 9 months ago - Stars: 7 - Forks: 1

notAI-tech/fastDeploy

Deploy DL/ ML inference pipelines with minimal extra code.

Language: Python - Size: 15.7 MB - Last synced at: 22 days ago - Pushed at: 8 months ago - Stars: 98 - Forks: 17

kibae/onnxruntime-server

ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.

Language: C++ - Size: 954 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 156 - Forks: 11

NGLSG/UniAPI

The Universal LLM Gateway - Integrate ANY AI Model with One Consistent API

Language: C++ - Size: 172 KB - Last synced at: 29 days ago - Pushed at: 2 months ago - Stars: 9 - Forks: 1

vertexclique/orkhon

Orkhon: ML Inference Framework and Server Runtime

Language: Rust - Size: 26.2 MB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 149 - Forks: 4

RubixML/Server

A standalone inference server for trained Rubix ML estimators.

Language: PHP - Size: 18.3 MB - Last synced at: 20 days ago - Pushed at: 4 months ago - Stars: 62 - Forks: 11

roboflow/inference-dashboard-example

Roboflow's inference server to analyze video streams. This project extracts insights from video frames at defined intervals and generates informative visualizations and CSV outputs.

Language: Python - Size: 97.7 KB - Last synced at: 7 days ago - Pushed at: about 1 year ago - Stars: 14 - Forks: 2

BMW-InnovationLab/BMW-TensorFlow-Inference-API-CPU

This is a repository for an object detection inference API using the Tensorflow framework.

Language: Python - Size: 9.96 MB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 183 - Forks: 47

autodeployai/ai-serving

Serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints

Language: Scala - Size: 285 KB - Last synced at: 4 months ago - Pushed at: 9 months ago - Stars: 156 - Forks: 31

NVIDIA/gpu-rest-engine 📦

A REST API for Caffe using Docker and Go

Language: C++ - Size: 255 KB - Last synced at: 6 days ago - Pushed at: almost 7 years ago - Stars: 419 - Forks: 93

niqbal996/triton_client

This repository serves as a client to send sensor messages from ROS or other sources to the Inference server and processes the inference results.

Language: Python - Size: 37.3 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

tensorchord/inference-benchmark

Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)

Language: Python - Size: 46.9 KB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 28 - Forks: 3

BMW-InnovationLab/BMW-YOLOv4-Inference-API-GPU

This is a repository for an nocode object detection inference API using the Yolov3 and Yolov4 Darknet framework.

Language: Python - Size: 20.5 MB - Last synced at: 15 days ago - Pushed at: about 3 years ago - Stars: 280 - Forks: 68

BMW-InnovationLab/BMW-YOLOv4-Inference-API-CPU

This is a repository for an nocode object detection inference API using the Yolov4 and Yolov3 Opencv.

Language: Python - Size: 21.4 MB - Last synced at: 15 days ago - Pushed at: about 3 years ago - Stars: 220 - Forks: 58

geniusrise/text

Text components powering LLMs & SLMs for geniusrise framework

Language: Python - Size: 15.6 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 5 - Forks: 2

dlzou/computron

Serving distributed deep learning models with model parallel swapping.

Language: Jupyter Notebook - Size: 2.1 MB - Last synced at: 17 days ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 1

redis-applied-ai/loan-prediction-microservice

An example of using Redis + RedisAI for a microservice that predicts consumer loan probabilities using Redis as a feature and model store and RedisAI as an inference server.

Language: Jupyter Notebook - Size: 24.6 MB - Last synced at: 8 months ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 0

geniusrise/audio

Audio components for geniusrise framework

Language: Python - Size: 70.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 1

xdevfaheem/TGS

Effortlessly Deploy and Serve Large Language Models in the Cloud as an API Endpoint for Inference

Language: Python - Size: 26.4 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

StefanoLusardi/tiny_inference_engine

Client/Server system to perform distributed inference on high load systems.

Language: C++ - Size: 11 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 1

zhangjun/TensorRT-Server

TensorRT Server

Language: C++ - Size: 68.4 KB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 1

leimao/Simple-Inference-Server

Inference Server Implementation from Scratch for Machine Learning Models

Language: Python - Size: 80.1 KB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 24 - Forks: 1

haicheviet/fullstack-machine-learning-inference

Fullstack machine learning inference template

Language: Jupyter Notebook - Size: 77.1 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 24 - Forks: 11

nikhiltadikonda/Kushagra-AI Fork of bshreddy/Kushagra-AI

An AI-powered mobile crop advisory app for farmers, gardeners that can provide information about crops using an image taken by the user. This supports 10 crops and 37 kinds of crop diseases. The AI model is a ResNet network that has been fine-tuned using crop images that were collected by web-scraping from Google Images and Plant-Village Dataset.

Size: 11.7 KB - Last synced at: 3 months ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

tensorchord/modelz-docs

Modelz is a developer-first platform for prototyping and deploying machine learning models.

Language: MDX - Size: 11.5 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 5

SABER-labs/torch_batcher

Serve pytorch inference requests using batching with redis for faster performance.

Language: Python - Size: 13.7 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 0

k9ele7en/Triton-TensorRT-Inference-CRAFT-pytorch

Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX

Language: Python - Size: 15.5 MB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 25 - Forks: 6

csy1204/TripBigs_Web

Session Based Real-time Hotel Recommendation Web Application

Language: Python - Size: 3.15 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 8 - Forks: 4

nikhiltadikonda/Kushagra Fork of bshreddy/Kushagra

Bundle of Repositories that power up all the Crop Prediction Applications

Size: 2.93 KB - Last synced at: 3 months ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

inference-server 42 inference 18 ai 10 machine-learning 10 deep-learning 9 docker 7 llm 7 inference-engine 7 artificial-intelligence 6 onnx 6 object-detection 6 computer-vision 6 rest-api 5 api 4 model-deployment 4 huggingface 4 pytorch 4 gpu 4 inference-api 4 python 4 tensorrt 3 http-server 3 detection-inference-api 3 cuda 3 serving 3 onnxruntime 3 triton-inference-server 3 python3 3 bounding-boxes 3 model-serving 3 tensorflow 3 stable-diffusion 3 cpu 2 mlops 2 deep-neural-networks 2 llmops 2 yolo 2 chatbot 2 podman 2 llm-inference 2 local 2 fastapi 2 llms 2 containers 2 grpc 2 generative-ai 2 redis 2 predictions 2 deeplearning 2 no-code 2 neural-network 2 yolov3 2 llamacpp 2 whisper 2 yolov4 2 cpp 2 inference-gui 2 falcon 2 lidar-point-cloud 1 machine-learning-template 1 ros 1 grpc-client 1 openpcdet 1 benchmark 1 pointpillars 1 php 1 php-machine-learning 1 php-ml 1 rubix-ml 1 rubix-server 1 computervision 1 docker-ce 1 docker-container 1 docker-image 1 tensorflow-framework 1 ai-serving 1 onnx-grpc 1 onnx-inference 1 onnx-models 1 onnx-realtime 1 onnx-rest 1 pmml 1 pmml-deployment 1 pmml-grpc 1 pmml-inference 1 pmml-model 1 pmml-realtime 1 pmml-rest 1 caffe 1 3d-detector 1 client-server 1 twitter-sentiment-analysis 1 modelz 1 serverless 1 batch-inference 1 nvidia-docker 1 onnx-torch 1 tensorrt-conversion 1 text-detection 1 text-detection-from-image 1