LLM-Inference-Serving

This repository demonstrates LLM execution on CPUs using packages like llamafile, emphasizing low-latency, high-throughput, and cost-effective benefits for inference and serving.

JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mddunlap924%2FLLM-Inference-Serving
PURL: pkg:github/mddunlap924/LLM-Inference-Serving

Stars: 2
Forks: 0
Open issues: 0

License: None
Language: Jupyter Notebook
Size: 6.4 MB
Dependencies parsed at: Pending

Created at: over 1 year ago
Updated at: over 1 year ago
Pushed at: over 1 year ago
Last synced at: over 1 year ago

Topics: deepspeed, large-language-models, llamacpp, llamafile, llm-inference, llm-serving, llms, vllm

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos

GitHub / mddunlap924 / LLM-Inference-Serving