GitHub topics: llm-serving

Repositories

vllm-project/vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

Language: Python - Size: 2.37 MB - Last synced at: about 6 hours ago - Pushed at: about 7 hours ago - Stars: 819 - Forks: 227

# llama-3.2-1b.vbThis project provides a simple way to run llama 3.2 1b fp16 CPU inference using VB.NET. Follow the setup instructions to ensure all necessary files are in place for smooth operation. 🐱💻✨

Language: Visual Basic .NET - Size: 23.4 KB - Last synced at: about 8 hours ago - Pushed at: about 9 hours ago - Stars: 0 - Forks: 0

gty111/gLLM

gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling

Language: Python - Size: 1.38 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 15 - Forks: 1

helixml/helix

♾️ Helix is a private GenAI stack for building AI agents with declarative pipelines, knowledge (RAG), API bindings, and first-class testing.

Language: Go - Size: 54.6 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 507 - Forks: 52

PaddlePaddle/FastDeploy

Large Language Model Deployment Toolkit

Language: Cuda - Size: 38.7 MB - Last synced at: 4 days ago - Pushed at: 14 days ago - Stars: 3,220 - Forks: 485

EM-GeekLab/LLMOne

Enterprise-grade LLM automated deployment tool that makes AI servers truly "plug-and-play".

Language: TypeScript - Size: 4.3 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 59 - Forks: 2

pierreolivierbonin/Canada-Labour-Research-Assistant

The Canada Labour Research Assistant (CLaRA) is a privacy-first LLM-powered research assistant proposing Easily Verifiable Direct Quotations (EVDQ) to mitigate hallucinations in answering questions about Canadian labour laws, standards, and regulations. It works entirely offline and locally, guaranteeing the confidentiality of your conversations.

Language: Python - Size: 4.56 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 4 - Forks: 0

thu-pacman/chitu

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Language: Python - Size: 35.6 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,149 - Forks: 76

sugarcane-ai/sugarcane-ai

npm like package ecosystem for Prompts 🤖

Language: TypeScript - Size: 11.5 MB - Last synced at: 2 days ago - Pushed at: 5 months ago - Stars: 50 - Forks: 14

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

Language: Python - Size: 23.1 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 15,425 - Forks: 2,173

ray-project/ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Language: Python - Size: 543 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 37,679 - Forks: 6,515

NexusGPU/tensor-fusion

Tensor Fusion is a state-of-the-art GPU virtualization and pooling solution designed to optimize GPU cluster utilization to its fullest potential.

Language: Go - Size: 990 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 45 - Forks: 12

skypilot-org/skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 16+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

Language: Python - Size: 157 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 8,276 - Forks: 684

gpustack/gpustack

Simple, scalable AI model deployment on GPU clusters

Language: Python - Size: 94.4 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 2,969 - Forks: 301

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language: Python - Size: 57.3 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 50,538 - Forks: 8,267

agoSantiago97/gemma-2-2b-it.cs

# gemma-2-2b-it.csThis project implements int8 CPU inference in pure C#. It ports a Rust repository using Gemini 2.5 Pro Preview, and you can easily build and run it with the provided batch files. 🐙💻

Language: C# - Size: 16.6 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 2 - Forks: 0

bentoml/OpenLLM

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

Language: Python - Size: 41.1 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 11,414 - Forks: 731

zhihu/ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

Language: C++ - Size: 967 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 899 - Forks: 104

superduper-io/superduper

Superduper: End-to-end framework for building custom AI applications and agents.

Language: Python - Size: 73.8 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 5,088 - Forks: 500

alibaba/rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

Language: C++ - Size: 307 MB - Last synced at: 7 days ago - Pushed at: 27 days ago - Stars: 802 - Forks: 68

bentoml/BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

Language: Python - Size: 95.8 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 7,809 - Forks: 847

rohan-paul/LLM-FineTuning-Large-Language-Models

LLM (Large Language Model) FineTuning

Language: Jupyter Notebook - Size: 11.3 MB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 542 - Forks: 131

predibase/lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Language: Python - Size: 6.62 MB - Last synced at: 7 days ago - Pushed at: about 1 month ago - Stars: 3,028 - Forks: 217

liguodongiot/llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

Language: HTML - Size: 23 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 18,706 - Forks: 2,224

interestingLSY/swiftLLM

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Language: Python - Size: 234 KB - Last synced at: 7 days ago - Pushed at: 20 days ago - Stars: 224 - Forks: 26

galeselee/Awesome_LLM_System-PaperList

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!

Size: 616 KB - Last synced at: about 19 hours ago - Pushed at: 4 months ago - Stars: 255 - Forks: 12

hpcaitech/SwiftInfer

Efficient AI Inference & Serving

Language: Python - Size: 508 KB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 471 - Forks: 29

EmbeddedLLM/embeddedllm

EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU

Language: Python - Size: 12.6 MB - Last synced at: 7 days ago - Pushed at: 9 months ago - Stars: 40 - Forks: 1

AkshaySyal/End-to-End-Basketball-QA-RAG-Capstone

Created a QA Chatbot powered by fine tuned text-to-sql LLM deployed on personal gaming laptop (Nvidia GTX 1650) using Ollama and Ngrok

Language: Jupyter Notebook - Size: 3.74 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 1

JohnClaw/gemma-2-2b-it.cs

gemma-2-2b-it int8 cpu inference in one file of pure C#

Language: C# - Size: 16.6 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 1 - Forks: 0

JohnClaw/llama-3.2-1b.vb

one-file llama 3.2 1b fp16 cpu inference in pure vb.net

Language: Visual Basic .NET - Size: 24.4 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 1 - Forks: 0

mosecorg/mosec

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

Language: Python - Size: 1.14 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 842 - Forks: 61

torchpipe/torchpipe

Serving Inside Pytorch

Language: C++ - Size: 41.6 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 160 - Forks: 13

friendliai/friendli-client

Friendli: the fastest serving engine for generative AI

Language: Python - Size: 4.88 MB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 47 - Forks: 7

alibaba/ServeGen

A framework for generating realistic LLM serving workloads

Language: Python - Size: 115 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 14 - Forks: 2

efeslab/Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Language: C++ - Size: 32.6 MB - Last synced at: 25 days ago - Pushed at: 26 days ago - Stars: 816 - Forks: 37

tdchaitanya/looplm

🔄 LoopLM: Command line tool accessing LLMs directly from your terminal

Language: Python - Size: 1.35 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 1 - Forks: 0

ray-project/ray-llm 📦

RayLLM - LLMs on Ray (Archived). Read README for more info.

Size: 1.98 MB - Last synced at: 21 days ago - Pushed at: 4 months ago - Stars: 1,261 - Forks: 92

ajithvcoder/TSAI-EMLO-4.0

Contains solutoins for assignments and learning notes from Extensive Machine Learning Operations course of The School of AI

Language: Python - Size: 32.7 MB - Last synced at: 27 days ago - Pushed at: 28 days ago - Stars: 1 - Forks: 1

borisdev/stack-sandbox Fork of michaeloliverx/python-poetry-docker-example

Stack Sandbox: uv & FastAPI & NextJS & Azure

Language: Dockerfile - Size: 47.9 KB - Last synced at: 27 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

Adarshreddyash/surfing-weights

Surfing weights to edge devices

Language: Python - Size: 8.36 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

microsoft/aici

AICI: Prompts as (Wasm) Programs

Language: Rust - Size: 9.71 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 2,027 - Forks: 83

MoonshotAI/MoBA

MoBA: Mixture of Block Attention for Long-Context LLMs

Language: Python - Size: 2.4 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 1,779 - Forks: 106

France-Travail/happy_vllm

A REST API for vLLM, production ready

Language: Python - Size: 859 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 21 - Forks: 2

powerserve-project/PowerServe

High-speed and easy-use LLM serving framework for local deployment

Language: C++ - Size: 1.11 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 107 - Forks: 9

cortecs-ai/cortecs-py

Lightweight wrapper for cortecs' provisioning API

Language: Python - Size: 418 KB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 6 - Forks: 0

bigai-nlco/TokenSwift

[ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation

Language: Python - Size: 61.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 92 - Forks: 8

azminewasi/Awesome-LLMs-ICLR-24

It is a comprehensive resource hub compiling all LLM papers accepted at the International Conference on Learning Representations (ICLR) in 2024.

Size: 821 KB - Last synced at: 15 days ago - Pushed at: about 1 year ago - Stars: 62 - Forks: 3

genlm/genlm-backend

High-performance backend for language model probabilistic programs

Language: Python - Size: 2.83 MB - Last synced at: 22 days ago - Pushed at: about 2 months ago - Stars: 9 - Forks: 1

ray-project/ray-educational-materials 📦

This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.

Language: Jupyter Notebook - Size: 24 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 393 - Forks: 76

Moha111-h/Qwen3

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Language: Shell - Size: 3.07 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

AdrianMosnegutu/docscribe.nvim

A Neovim plugin for generating inline documentation for your functions using LLMs.

Language: Lua - Size: 7.32 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

mani-kantap/llm-inference-solutions

A collection of all available inference solutions for the LLMs

Size: 30.3 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 87 - Forks: 3

nuhmanpk/quick-llama

Run Ollama models anywhere easily

Language: Python - Size: 319 KB - Last synced at: 5 days ago - Pushed at: 2 months ago - Stars: 4 - Forks: 0

theneildave/ml-engineering

Machine Engineering Comprehensive Guide

Size: 1000 Bytes - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

efficientscaling/Z1

Repo for "Z1: Efficient Test-time Scaling with Code"

Language: Python - Size: 422 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 45 - Forks: 1

HPMLL/BurstGPT

A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems

Language: Python - Size: 19 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 159 - Forks: 9

Neural-Dragon-AI/Cynde

A Framework For Intelligence Farming

Language: Python - Size: 1.2 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 13 - Forks: 0

ehsanghaffar/llm-practice

A self-hosted personal chatbot API with FastAPI. It allows you to interact with the Llama2 LLM (and other open-source LLMs) to have natural language conversations, generate text, and perform various language-related tasks.

Language: Jupyter Notebook - Size: 108 KB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 11 - Forks: 2

France-Travail/benchmark_llm_serving

A library to benchmark LLMs via their API exposure

Language: Python - Size: 8.04 MB - Last synced at: 14 days ago - Pushed at: about 2 months ago - Stars: 6 - Forks: 0

Kira94-hkz/PowerServe

High-speed and easy-use LLM serving framework for local deployment

Size: 1000 Bytes - Last synced at: 14 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

jbchouinard/llmailbot

A service for chatting with LLMs via email.

Language: Python - Size: 296 KB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

ivynya/illm

internet llm - access your ollama (or any other local llm) instance from across the internet

Language: Go - Size: 85.9 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

henryle97/llm-serving-benchmark

LLM Serving Libs Benchmark

Language: Python - Size: 0 Bytes - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

MinjaeKIM753/ClaudeComputerUseBeta-Win64

Claude 3.5 Sonnet ComputerUse (Beta) for Win64

Language: Python - Size: 198 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 10 - Forks: 4

CentML/llm-inference-bench

Lightweight and extensible LLM Inference serving benchmark tool written in Rust.

Language: Rust - Size: 18.6 KB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

genia-dev/vibraniumdome

LLM Security Platform.

Language: Python - Size: 2.87 MB - Last synced at: 4 months ago - Pushed at: 8 months ago - Stars: 10 - Forks: 2

george-mountain/web-app-builder--LLM

Building Static Web Applications using Large Language Model. From hand sketched documents, images and screenshots to proper web pages.

Language: Python - Size: 2.11 MB - Last synced at: 18 days ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

Jason-cs18/HetServe-Foundation

A Overview of Efficiently Serving Foundation Models across Edge Devices

Size: 358 KB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 13 - Forks: 0

IvanLuLyf/bunny-llm

Deno LLM API Service

Language: TypeScript - Size: 132 KB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 7 - Forks: 1

romitjain/gpt-benchmark

Making small models as fast as possible

Language: Python - Size: 1.91 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

unaidedelf8777/faster-outlines

A Lazy, high throughput and blazing fast structured text generation backend.

Language: Rust - Size: 3.68 MB - Last synced at: 3 days ago - Pushed at: 8 months ago - Stars: 5 - Forks: 0

oscinis-com/Awesome-LLM-Productization

Awesome-LLM-Productization: a curated list of tools/tricks/news/regulations about AI and Large Language Model (LLM) productization

Size: 275 KB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 23 - Forks: 4

slai-labs/get-beam

Run GPU inference and training jobs on serverless infrastructure that scales with you.

Language: Shell - Size: 5.96 MB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 102 - Forks: 23

diverged/tavily-go

An unofficial Go port of the official Tavily API Python Wrapper.

Language: Go - Size: 17.6 KB - Last synced at: 2 months ago - Pushed at: 12 months ago - Stars: 4 - Forks: 0

fork123aniket/LLM-RAG-powered-QA-App

A Production-Ready, Scalable RAG-powered LLM-based Context-Aware QA App

Language: Python - Size: 22.5 KB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 5 - Forks: 1

EAlmazanG/sentiment-analysis-reviews

A cost-effective solution for stores and startups to analyze customer reviews, classify sentiment (positive, neutral, negative), and gain actionable insights through an interactive dashboard.

Language: Jupyter Notebook - Size: 34.9 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

biosfood/intel-llm-guide

A guide on how to run LLMs on intel CPUs

Language: Python - Size: 20.5 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

chenhunghan/ialacol 📦

🪶 Lightweight OpenAI drop-in replacement for Kubernetes

Language: Python - Size: 250 KB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 144 - Forks: 17

friendliai/lm-evaluation-harness Fork of EleutherAI/lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models.

Language: Python - Size: 28.1 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

genia-dev/vibraniumdome-docs

LLM Security Platform Docs

Language: MDX - Size: 635 KB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

biomchen/llm-serving

Basic APIs for serving LLMs locally.

Language: Python - Size: 31.3 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

KevinZeng08/efficient-large-model-papers

A Curated Paper List for Efficient Large Models

Size: 1.95 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 1

substratusai/runbooks 📦

Finetune LLMs on K8s by using Runbooks

Language: Go - Size: 5.22 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 168 - Forks: 14

okikorg/okik

Okik is serving framework to deploy LLMs and much more.

Language: Python - Size: 5.13 MB - Last synced at: 20 days ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

valyu-network/Stitch

Stitch simplifies and scales LLM application deployment, reducing infrastructure complexity and costs.

Language: Python - Size: 2.53 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

LoopGlitch26/Hinglish-AI-Mentor

Hinglish Chatbot powered by Azure Cognitive Services, Google Translate and Open AI

Language: Jupyter Notebook - Size: 974 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 5 - Forks: 1

ray-project/llms-in-prod-workshop-2023 📦

Deploy and Scale LLM-based applications

Language: Jupyter Notebook - Size: 13.1 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 23 - Forks: 3

ray-project/anyscale-berkeley-ai-hackathon 📦

Ray and Anyscale for UC Berkeley AI Hackathon!

Language: Jupyter Notebook - Size: 77.1 KB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 11 - Forks: 0

george-mountain/LLM-Local-Streaming

Streaming of LLM responses in realtime using Fastapi and Streamlit.

Language: Python - Size: 32.2 KB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

sugarcane-ai/sugarcane-ai.github.io

Language: Astro - Size: 17.7 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 15 - Forks: 3

InquestGeronimo/horizon-takeoff

Automating the deployment of the Takeoff Server on AWS for LLMs

Language: Python - Size: 1.08 MB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

emmaalecrim/llm-ws

Typescript LLM Websocket reverse proxy built for streaming of various inference tasks

Language: TypeScript - Size: 673 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

suleymansevimli/run-llm-model-locally

You can run any large language model on your local machine with this repository.

Language: Python - Size: 1.95 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

asprenger/ray_vllm_inference

A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.

Language: Python - Size: 81.1 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 1

mddunlap924/LLM-Inference-Serving

This repository demonstrates LLM execution on CPUs using packages like llamafile, emphasizing low-latency, high-throughput, and cost-effective benefits for inference and serving.

Language: Jupyter Notebook - Size: 6.4 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

liux2/DL_env_Setups

Deep learning environment setups

Language: Shell - Size: 23.4 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ray-project/llm-application 📦

Language: Jupyter Notebook - Size: 20.5 KB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 6 - Forks: 2

Stosan/commentator

Language: Python - Size: 12.2 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 1

awesome-software/EasyEdit Fork of zjunlp/EasyEdit

An Easy-to-use Knowledge Editing Framework for LLMs.

Size: 15.5 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

Related Keywords

llm-serving 100 llm 64 llm-inference 50 llmops 26 inference 21 llama 17 model-serving 15 pytorch 14 mlops 13 llm-training 12 transformer 11 ai 11 deep-learning 10 machine-learning 10 python 10 llms 10 qwen 8 cuda 8 vllm 8 openai 7 gpt 7 inference-engine 7 large-language-models 7 ray 7 llm-framework 6 deepseek 6 llama2 6 generative-ai 6 prompt-engineering 5 gpu 5 chatgpt 5 mistral 5 llm-agent 4 cpu-inference 4 llama3 4 llamacpp 4 large-language-model 4 artificial-intelligence 4 prompts 4 ollama 4 fine-tuning 4 serving 4 streamlit 4 open-source-llm 3 retrieval-augmented-generation 3 anyscale 3 ray-distributed 3 llm-security 3 llm-evaluation 3 gemma 3 deepseek-r1 3 data-science 3 langchain 3 rag 3 ml 3 npu 3 transformers 3 csharp 2 language-model 2 prompt-injection 2 papers 2 prompt-injection-tool 2 security 2 rocm 2 chatgpt-api 2 openai-api 2 tpu 2 pypi 2 ml-platform 2 ml-infrastructure 2 cost-optimization 2 cloud-computing 2 smartphone 2 windows 2 autoscaling 2 tensorflow 2 smallthinker 2 production 2 llm-eval 2 langchain-python 2 model-inference 2 llm-ops 2 llm-firewall 2 llama3-2 2 anthropic 2 chatbot 2 pretrained-models 2 torch 2 bentoml 2 quantization 2 aws 2 ray-serve 2 tts 2 int8-quantization 2 int8-inference 2 int8 2 rust 2 azure 2 gemma2-2b-it 2 gemma2 2