GitHub topics: model-inference-service
bentoml/BentoML
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
Language: Python - Size: 95.3 MB - Last synced at: 2 days ago - Pushed at: 5 days ago - Stars: 7,635 - Forks: 834

bentoml/transformers-nlp-service
Online Inference API for NLP Transformer models - summarization, text classification, sentiment analysis and more
Language: Python - Size: 6.3 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 44 - Forks: 3

bentoml/CLIP-API-service
CLIP as a service - Embed image and sentences, object recognition, visual reasoning, image classification and reverse image search
Language: Jupyter Notebook - Size: 945 KB - Last synced at: 16 days ago - Pushed at: over 1 year ago - Stars: 60 - Forks: 4

ksm26/Efficiently-Serving-LLMs
Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.
Language: Jupyter Notebook - Size: 2.34 MB - Last synced at: 24 days ago - Pushed at: about 1 year ago - Stars: 11 - Forks: 3
