GitHub topics: large-scale-deployment

Repositories

ksm26/Efficiently-Serving-LLMs

Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.

Language: Jupyter Notebook - Size: 2.34 MB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 15 - Forks: 4

Related Keywords

batch-processing 1 deep-learning-techniques 1 inference-optimization 1 large-scale-deployment 1 machine-learning-operations 1 model-acceleration 1 model-inference-service 1 model-serving 1 optimization-techniques 1 performance-enhancement 1 scalability-strategies 1 server-optimization 1 serving-infrastructure 1 text-generation 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos

GitHub topics: large-scale-deployment

ksm26/Efficiently-Serving-LLMs