serving-infrastructure | Topic | Ecosyste.ms: Repos

Topic: "serving-infrastructure"

ksm26/Efficiently-Serving-LLMs

Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.

Language: Jupyter Notebook - Size: 2.34 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 11 - Forks: 3

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos

Topic: "serving-infrastructure"

ksm26/Efficiently-Serving-LLMs