GitHub topics: vllm
ARYAN555279/Batch_LLM_Inference_with_Ray_Data_LLM
Batch LLM Inference with Ray Data LLM: From Simple to Advanced
Language: Dockerfile - Size: 1.5 MB - Last synced at: 16 minutes ago - Pushed at: 19 minutes ago - Stars: 0 - Forks: 0

runpod-workers/worker-vllm
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
Language: Python - Size: 26.3 MB - Last synced at: 23 minutes ago - Pushed at: about 2 hours ago - Stars: 307 - Forks: 149

VectorInstitute/vector-inference
Efficient LLM inference on Slurm clusters using vLLM.
Language: Python - Size: 2.37 MB - Last synced at: about 2 hours ago - Pushed at: about 2 hours ago - Stars: 57 - Forks: 10

Liquid4All/on-prem-stack
Scripts to launch Liquid on-prem stack
Language: Shell - Size: 160 KB - Last synced at: 18 minutes ago - Pushed at: about 1 hour ago - Stars: 2 - Forks: 1

priyanshua44/no-llm
no-llm is a lightweight library designed to simplify the integration of machine learning models without relying on large language models. It provides essential tools for developers to create efficient, scalable applications while maintaining clear and concise code.
Language: Python - Size: 170 KB - Last synced at: about 7 hours ago - Pushed at: about 8 hours ago - Stars: 0 - Forks: 0

InftyAI/llmaz
☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
Language: Go - Size: 6.46 MB - Last synced at: about 10 hours ago - Pushed at: about 12 hours ago - Stars: 125 - Forks: 20

jasonacox/TinyLLM
Setup and run a local LLM and Chatbot using consumer grade hardware.
Language: JavaScript - Size: 743 KB - Last synced at: about 15 hours ago - Pushed at: about 16 hours ago - Stars: 239 - Forks: 28

containers/ramalama
The goal of RamaLama is to make working with AI boring.
Language: Python - Size: 2.37 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1,538 - Forks: 162

qizhou000/VisEdit
[AAAI 2025 oral] Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit
Language: Python - Size: 3.46 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 8 - Forks: 0

apconw/sanic-web
一个轻量级、支持全链路且易于二次开发的大模型应用项目(Large Model Data Assistant) 支持DeepSeek/Qwen2.5等大模型 基于 Dify 、Ollama&Vllm、Sanic 和 Text2SQL 📊 等技术构建的一站式大模型应用开发项目,采用 Vue3、TypeScript 和 Vite 5 打造现代UI。它支持通过 ECharts 📈 实现基于大模型的数据图形化问答,具备处理 CSV 文件 📂 表格问答的能力。同时,能方便对接第三方开源 RAG 系统 检索系统 🌐等,以支持广泛的通用知识问答。
Language: JavaScript - Size: 142 MB - Last synced at: 2 days ago - Pushed at: 4 days ago - Stars: 435 - Forks: 83

meta-llama/llama-cookbook
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model family and using them on various provider services
Language: Jupyter Notebook - Size: 209 MB - Last synced at: 2 days ago - Pushed at: 4 days ago - Stars: 17,096 - Forks: 2,446

microsoft/vidur
A large-scale simulation framework for LLM inference
Language: Python - Size: 156 MB - Last synced at: 1 day ago - Pushed at: 5 months ago - Stars: 364 - Forks: 65

harleyszhang/llm_note
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
Language: Python - Size: 176 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 722 - Forks: 73

mostlygeek/llama-swap
Model swapping for llama.cpp (or any local OpenAPI compatible server)
Language: Go - Size: 552 KB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 537 - Forks: 31

katanaml/sparrow
Data processing with ML, LLM and Vision LLM
Language: Python - Size: 11.3 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 4,476 - Forks: 451

suncoast-soft/LLM-VoIP-Caller
This project is the backend engine for a fully autonomous AI-powered call center. It integrates a large language model (LLM), speech recognition, and text-to-speech to manage real-time phone conversations via Asterisk.
Language: Python - Size: 39.1 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

aws-samples/easy-model-deployer
A user-friendly Command-line/SDK tool that makes it quickly and easier to deploy open-source LLMs on AWS
Language: Python - Size: 39.6 MB - Last synced at: about 12 hours ago - Pushed at: about 13 hours ago - Stars: 34 - Forks: 5

xlite-dev/Awesome-LLM-Inference
📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc.
Language: Python - Size: 115 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 3,851 - Forks: 275

hyperai/vllm-cn
vLLM Documentation in Chinese Simplified / vLLM 中文文档
Language: TypeScript - Size: 3.65 MB - Last synced at: about 24 hours ago - Pushed at: 6 days ago - Stars: 60 - Forks: 5

sherlockchou86/PyLangPipe
a simple lightweight large language model pipeline framework.
Language: Python - Size: 790 KB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 16 - Forks: 2

umi-AIGC-saas/umi_ai_cms
Platform_maultimodal is a collection of tool platforms.
Language: Python - Size: 4.16 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 2 - Forks: 0

substratusai/kubeai
AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.
Language: Go - Size: 15.9 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 890 - Forks: 85

HuiResearch/Fast-Spark-TTS
基于SparkTTS、OrpheusTTS等模型,提供高质量中文语音合成与声音克隆服务。
Language: Python - Size: 30.7 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 282 - Forks: 39

hienhayho/rag-colls
Collection of recent advanced RAG techniques.
Language: Python - Size: 10.1 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 12 - Forks: 4

chtmp223/topicGPT
TopicGPT: A Prompt-Based Framework for Topic Modeling (NAACL'24)
Language: Python - Size: 828 KB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 282 - Forks: 46

vllm-project/vllm-ascend
Community maintained hardware plugin for vLLM on Ascend
Language: Python - Size: 911 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 477 - Forks: 87

OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
Language: Python - Size: 2.53 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 6,300 - Forks: 618

hcd233/Aris-AI-Model-Server
An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API
Language: Python - Size: 1.05 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 13 - Forks: 1

jakobdylanc/llmcord
Make Discord your LLM frontend ● Supports any OpenAI compatible API (Ollama, LM Studio, vLLM, OpenRouter, xAI, Mistral, Groq and more)
Language: Python - Size: 168 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 523 - Forks: 103

afhverjuekki/logic-markers-ai
Generate Logic Pro Markers from Video with AI
Language: Python - Size: 144 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

scitix/arks
Arks is a cloud-native inference framework running on Kubernetes
Language: Go - Size: 353 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 4 - Forks: 2

ModelTC/llmc
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
Language: Python - Size: 28.9 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 456 - Forks: 53

ModelCloud/GPTQModel
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Language: Python - Size: 11.8 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 465 - Forks: 68

xorbitsai/inference
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
Language: Python - Size: 44.6 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 7,500 - Forks: 636

FreeIPCC/LLM-ContactCenter-AI-CallCenter
LLM Call Center,AI Call Center,大模型呼叫中心,大模型客服系统,可以对接市面上主流模型与私有模型:OpenAI,LLaMA,Kimi,通义千问,智谱AI,讯飞星火,Gemini,Xorbits Inference,Amazon Bedrock,火山引擎,腾讯混元,Claude,Bard,DeepSeek,Azure OpenAI,千帆大模型,Ollama,qwen,vLLM
Language: TypeScript - Size: 23.1 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 5 - Forks: 4

mohammad-nour-alawad/Voice-to-Pandas-LLM-backend
FAST API for LLM Inference with Qwen2.5, Whisper AI and Vits TTS
Language: Python - Size: 252 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

NEOS-AI/Neosearch
AI-based search engine done right
Language: HTML - Size: 97.4 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 16 - Forks: 0

IDEA-Research/RexSeek
Referring any person or objects given a natural language description. Code base for RexSeek and HumanRef Benchmark
Language: Python - Size: 9.55 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 112 - Forks: 8

JackYFL/awesome-VLLMs
This repository collects papers on VLLM applications. We will update new papers irregularly.
Size: 893 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 86 - Forks: 8

lamalab-org/macbench
Probing the limitations of multimodal language models for chemistry and materials research
Language: Python - Size: 2.18 GB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 14 - Forks: 0

LLM-inference-router/vllm-router
vLLM Router
Language: Python - Size: 45.9 KB - Last synced at: 2 days ago - Pushed at: about 1 year ago - Stars: 26 - Forks: 1

g-eoj/guided-agents
Use structured output to control agents.
Language: Python - Size: 396 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

vectara/mirage-bench
Repository for Multililngual Generation, RAG evaluations, and surrogate judge training for Arena RAG leaderboard (NAACL'25)
Language: Python - Size: 2.8 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 7 - Forks: 0

huahuadeliaoliao/RoseChat
AI agent with async, multithreading and mcp support
Language: Python - Size: 40 KB - Last synced at: 10 days ago - Pushed at: 11 days ago - Stars: 1 - Forks: 0

FlagOpen/RoboBrain
[CVPR 2025] RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete. Official Repository.
Language: Python - Size: 13.3 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 138 - Forks: 8

shell-nlp/gpt_server
gpt_server是一个用于生产级部署LLMs或Embedding的开源框架。
Language: Python - Size: 2.42 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 165 - Forks: 15

0-mostafa-rezaee-0/Batch_LLM_Inference_with_Ray_Data_LLM
Batch LLM Inference with Ray Data LLM: From Simple to Advanced
Language: Jupyter Notebook - Size: 1.63 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 3 - Forks: 1

lework/llm-benchmark
LLM 并发性能测试工具,支持自动化压力测试和性能报告生成。
Language: Python - Size: 117 KB - Last synced at: 8 days ago - Pushed at: 29 days ago - Stars: 28 - Forks: 6

llmariner/llmariner
Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs.
Language: Go - Size: 7.84 MB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 65 - Forks: 7

wizenheimer/periscope
LLM Performance Testing | K6 + Grafana + InfluxDB | A tiny toolkit for load testing and benchmarking OpenAI-like inference endpoints using K6 + Grafana + InfluxDB
Language: JavaScript - Size: 563 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

xxrjun/local-inference
🐑 Run LLM inference locally for various downstream applications.
Language: Shell - Size: 2.53 MB - Last synced at: 9 days ago - Pushed at: 12 days ago - Stars: 2 - Forks: 0

France-Travail/happy_vllm
A REST API for vLLM, production ready
Language: Python - Size: 855 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 19 - Forks: 2

Svastikkka/HELM
HELM Repository to deploy Services
Language: Smarty - Size: 104 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

kyegomez/SimpleUnet
An simple implementation of Unet because all the implementations i've seen are wayy tooo complicated.
Language: Python - Size: 205 KB - Last synced at: 2 days ago - Pushed at: 5 months ago - Stars: 9 - Forks: 1

argonne-lcf/LLM-Inference-Bench
LLM-Inference-Bench
Language: Jupyter Notebook - Size: 11.2 MB - Last synced at: 12 days ago - Pushed at: 4 months ago - Stars: 39 - Forks: 4

xxxjjhhh/vllm_docker
개발자 유미 : vLLM 도커 환경 배포 스크립트 및 예시 코드
Language: Python - Size: 11.7 KB - Last synced at: 4 days ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

prodesk98/advanced-deep-research
Automated Deep Research with LLMs, web search, paper parsing, and didactic summarization.
Language: Python - Size: 127 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 1 - Forks: 0

moeru-ai/demodel
🚀🛸 Easily boost the speed of pulling your models and datasets from various of inference runtimes. (e.g. 🤗 HuggingFace, 🐫 Ollama, vLLM, and more!)
Language: Rust - Size: 47.9 KB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

yoziru/nextjs-vllm-ui
Fully-featured, beautiful web interface for vLLM - built with NextJS.
Language: TypeScript - Size: 6.2 MB - Last synced at: 16 days ago - Pushed at: 28 days ago - Stars: 118 - Forks: 19

prometheus-eval/prometheus-eval
Evaluate your LLM's response with Prometheus and GPT4 💯
Language: Python - Size: 15 MB - Last synced at: 17 days ago - Pushed at: about 1 month ago - Stars: 898 - Forks: 55

samzong/fastllm
A minimal LLM server launcher in just ~100 lines of Python code.
Language: Python - Size: 23.4 KB - Last synced at: 8 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

yahavb/coldstart-recs-on-aws-trainium Fork of aws-samples/eks_gpu_and_trainuim_perceiver_io_training
End-to-end solution for cold-start recommendations using vLLM, DeepSeek Llama (8B & 70B), and FAISS on AWS Trainium (Trn1) with the Neuron SDK and NeuronX Distributed. Includes LLM-based interest expansion, embedding comparisons (T5 & SentenceTransformers), and scalable retrieval workflows.
Language: Python - Size: 1.08 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

vam876/LocalAPI.AI
LocalAPI.AI is a local AI management tool for Ollama, offering Web UI management and compatibility with vLLM, LM Studio, llama.cpp, Mozilla-Llamafile, Jan Al, Cortex API, Local-LLM, LiteLLM, GPT4All, and more.
Language: HTML - Size: 1.25 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 17 - Forks: 0

NetEase-Media/grps
Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offering scalability, extensibility, and high performance. It helps users quickly deploy models and provide services through HTTP/RPC interfaces.
Language: C++ - Size: 67.8 MB - Last synced at: 17 days ago - Pushed at: 27 days ago - Stars: 157 - Forks: 13

Getty/langertha
Perl Framework for AI - Langertha - the viking of AI
Language: Perl - Size: 325 KB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 3 - Forks: 0

YY0649/ICE-PIXIU
ICE-PIXIU:A Cross-Language Financial Megamodeling Framework
Language: Python - Size: 118 MB - Last synced at: 2 days ago - Pushed at: 5 months ago - Stars: 16 - Forks: 0

varunshenoy/super-json-mode
Low latency JSON generation using LLMs ⚡️
Language: Jupyter Notebook - Size: 652 KB - Last synced at: 17 days ago - Pushed at: about 1 year ago - Stars: 398 - Forks: 14

France-Travail/benchmark_llm_serving
A library to benchmark LLMs via their API exposure
Language: Python - Size: 8.04 MB - Last synced at: 22 days ago - Pushed at: 4 months ago - Stars: 6 - Forks: 0

brokedba/vllm-lab
This Repository contains terraform configuration for vllm production-stack in the cloud managed K8s
Size: 25.4 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

OpenCSGs/llm-inference
llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.
Language: Python - Size: 602 KB - Last synced at: 10 days ago - Pushed at: 11 months ago - Stars: 80 - Forks: 16

Joshue2006/LLM-Reasoner
Make any LLM to think like OpenAI o1 and deepseek R1
Size: 1.95 KB - Last synced at: 23 days ago - Pushed at: 29 days ago - Stars: 3 - Forks: 0

Murtaza-arif/RAG-Agnostic-Guide
A comprehensive guide to building Retrieval-Augmented Generation (RAG) systems using various open-source tools.
Language: HTML - Size: 19.6 MB - Last synced at: 22 days ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

gotzmann/booster
Booster - open accelerator for LLM models. Better inference and debugging for AI hackers
Language: C++ - Size: 144 MB - Last synced at: 2 days ago - Pushed at: 8 months ago - Stars: 154 - Forks: 7

Followb1ind1y/Medical-LLM-Fine-tuning
Fine-tunes LLaMA-3-8B on PubMedQA with QLoRA, optimized via DeepSpeed and vLLM for efficient, low-latency medical QA. Deployable via Docker for scalable clinical inference.
Language: Jupyter Notebook - Size: 3.38 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

mahshid1378/Project-vLLM
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
Language: Python - Size: 402 KB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

iNeil77/vllm-code-harness
Run code inference-only benchmarks quickly using vLLM
Language: Python - Size: 814 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 11 - Forks: 0

mahshid1378/Worker-vLLM
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
Language: Python - Size: 26 MB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

mahshid1378/production-stack
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
Language: Python - Size: 1.85 MB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

mahshid1378/Sparrow-vLLM
Data processing with ML, LLM and Vision LLM
Language: Python - Size: 5.3 MB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

mahshid1378/Overall-Model-New-LLM
Im working overall model LLM, vLLM, LCM and you see the on Repositories the component model.
Language: Jupyter Notebook - Size: 141 MB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

jparkerweb/down-craft
📑 npm pacakge to Craft files into Markdown with ease
Language: JavaScript - Size: 17.4 MB - Last synced at: 18 days ago - Pushed at: 4 months ago - Stars: 9 - Forks: 1

itelnov/skeernir
UI to deploy locally agents and customise interaction with them
Language: Python - Size: 13.8 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 11 - Forks: 2

bluechanel/deploy_llm
Rapid Deployment of LLM and Embedding Based on VLLM Using Docker
Language: Python - Size: 375 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 6 - Forks: 1

su-shubham/Duuq
[WIP] Real-Time Harmful Detection Pipeline Using vLLM
Language: TypeScript - Size: 202 KB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

KevinLee1110/dynamic-batching
The official repo for the paper "Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching"
Size: 11.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 8 - Forks: 1

oogou11/AgriBrain
AgriBrain is a cloud-native smart AI agriculture platform based on Spring AI Alibaba、 LLM, JDK 17, and Alibaba Cloud ACK (Container Service)
Size: 3.91 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

tommathewXC/lidia
A fully customizable, super light-weight, cross-platform GenAI based Personal Assistant that can be run locally on your private hardware!
Language: Python - Size: 21.5 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

intelligentnode/IntelliChat
Modern AI chatbot supporting multiple LLMs. Switch between Gemini, Mistral, Llama, Claude and ChatGPT.
Language: TypeScript - Size: 20.8 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 54 - Forks: 17

itsvaibhav01/Immune
[CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
Language: Python - Size: 2.77 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 5 - Forks: 0

muziyongshixin/SimpleAIGateway
This is a simple AI gateway implemented in Python, which has functions such as load balancing, error alarm, disaster recovery and backup.
Language: Python - Size: 107 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

itsvaibhav01/immune-web
Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
Language: JavaScript - Size: 97.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

claw1200/llama-cord
Discord App for Interacting with local Ollama Models. Multiple Agents Supported!
Language: Python - Size: 60.5 KB - Last synced at: 2 days ago - Pushed at: 2 months ago - Stars: 3 - Forks: 0

ivangabriele/docker-llm
Pre-loaded LLMs served as an OpenAI-Compatible API via Docker images.
Language: Dockerfile - Size: 199 KB - Last synced at: 20 days ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 2

lucasjinreal/Namo-R1
A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.
Language: Python - Size: 1.13 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 129 - Forks: 14

kingabzpro/Deploying-Llama-3.3-70B
Serve Llama 3.3 70B (with AWQ quantization) using vLLM and deploy it on BentoCloud.
Language: Python - Size: 9.77 KB - Last synced at: 29 days ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

Trainy-ai/llm-atc 📦
Fine-tuning and serving LLMs on any cloud
Language: Python - Size: 1.71 MB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 89 - Forks: 2

ALucek/ppt2desc
Convert PowerPoint files into semantically rich text using vision language models
Language: Python - Size: 1.42 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 84 - Forks: 7

melodydepok/llama-cord
Discord App for Interacting with local Ollama Models. Multiple Agents Supported!
Size: 1000 Bytes - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

yueying-teng/generate-language-image-instruction-following-data
Mistral assisted visual instruction data generation by following LLaVA
Language: Python - Size: 69.3 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 1

varunvasudeva1/llm-server-docs
Documentation on setting up an LLM server on Debian from scratch, using Ollama/vLLM, Open WebUI, OpenedAI Speech/Kokoro FastAPI, and ComfyUI.
Size: 31.3 KB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 305 - Forks: 25
