An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: vllm

ARYAN555279/Batch_LLM_Inference_with_Ray_Data_LLM

Batch LLM Inference with Ray Data LLM: From Simple to Advanced

Language: Dockerfile - Size: 1.5 MB - Last synced at: 16 minutes ago - Pushed at: 19 minutes ago - Stars: 0 - Forks: 0

runpod-workers/worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

Language: Python - Size: 26.3 MB - Last synced at: 23 minutes ago - Pushed at: about 2 hours ago - Stars: 307 - Forks: 149

VectorInstitute/vector-inference

Efficient LLM inference on Slurm clusters using vLLM.

Language: Python - Size: 2.37 MB - Last synced at: about 2 hours ago - Pushed at: about 2 hours ago - Stars: 57 - Forks: 10

Liquid4All/on-prem-stack

Scripts to launch Liquid on-prem stack

Language: Shell - Size: 160 KB - Last synced at: 18 minutes ago - Pushed at: about 1 hour ago - Stars: 2 - Forks: 1

priyanshua44/no-llm

no-llm is a lightweight library designed to simplify the integration of machine learning models without relying on large language models. It provides essential tools for developers to create efficient, scalable applications while maintaining clear and concise code.

Language: Python - Size: 170 KB - Last synced at: about 7 hours ago - Pushed at: about 8 hours ago - Stars: 0 - Forks: 0

InftyAI/llmaz

☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!

Language: Go - Size: 6.46 MB - Last synced at: about 10 hours ago - Pushed at: about 12 hours ago - Stars: 125 - Forks: 20

jasonacox/TinyLLM

Setup and run a local LLM and Chatbot using consumer grade hardware.

Language: JavaScript - Size: 743 KB - Last synced at: about 15 hours ago - Pushed at: about 16 hours ago - Stars: 239 - Forks: 28

containers/ramalama

The goal of RamaLama is to make working with AI boring.

Language: Python - Size: 2.37 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1,538 - Forks: 162

qizhou000/VisEdit

[AAAI 2025 oral] Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit

Language: Python - Size: 3.46 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 8 - Forks: 0

apconw/sanic-web

一个轻量级、支持全链路且易于二次开发的大模型应用项目(Large Model Data Assistant) 支持DeepSeek/Qwen2.5等大模型 基于 Dify 、Ollama&Vllm、Sanic 和 Text2SQL 📊 等技术构建的一站式大模型应用开发项目,采用 Vue3、TypeScript 和 Vite 5 打造现代UI。它支持通过 ECharts 📈 实现基于大模型的数据图形化问答,具备处理 CSV 文件 📂 表格问答的能力。同时,能方便对接第三方开源 RAG 系统 检索系统 🌐等,以支持广泛的通用知识问答。

Language: JavaScript - Size: 142 MB - Last synced at: 2 days ago - Pushed at: 4 days ago - Stars: 435 - Forks: 83

meta-llama/llama-cookbook

Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model family and using them on various provider services

Language: Jupyter Notebook - Size: 209 MB - Last synced at: 2 days ago - Pushed at: 4 days ago - Stars: 17,096 - Forks: 2,446

microsoft/vidur

A large-scale simulation framework for LLM inference

Language: Python - Size: 156 MB - Last synced at: 1 day ago - Pushed at: 5 months ago - Stars: 364 - Forks: 65

harleyszhang/llm_note

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

Language: Python - Size: 176 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 722 - Forks: 73

mostlygeek/llama-swap

Model swapping for llama.cpp (or any local OpenAPI compatible server)

Language: Go - Size: 552 KB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 537 - Forks: 31

katanaml/sparrow

Data processing with ML, LLM and Vision LLM

Language: Python - Size: 11.3 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 4,476 - Forks: 451

suncoast-soft/LLM-VoIP-Caller

This project is the backend engine for a fully autonomous AI-powered call center. It integrates a large language model (LLM), speech recognition, and text-to-speech to manage real-time phone conversations via Asterisk.

Language: Python - Size: 39.1 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

aws-samples/easy-model-deployer

A user-friendly Command-line/SDK tool that makes it quickly and easier to deploy open-source LLMs on AWS

Language: Python - Size: 39.6 MB - Last synced at: about 12 hours ago - Pushed at: about 13 hours ago - Stars: 34 - Forks: 5

xlite-dev/Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc.

Language: Python - Size: 115 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 3,851 - Forks: 275

hyperai/vllm-cn

vLLM Documentation in Chinese Simplified / vLLM 中文文档

Language: TypeScript - Size: 3.65 MB - Last synced at: about 24 hours ago - Pushed at: 6 days ago - Stars: 60 - Forks: 5

sherlockchou86/PyLangPipe

a simple lightweight large language model pipeline framework.

Language: Python - Size: 790 KB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 16 - Forks: 2

umi-AIGC-saas/umi_ai_cms

Platform_maultimodal is a collection of tool platforms.

Language: Python - Size: 4.16 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 2 - Forks: 0

substratusai/kubeai

AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.

Language: Go - Size: 15.9 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 890 - Forks: 85

HuiResearch/Fast-Spark-TTS

基于SparkTTS、OrpheusTTS等模型,提供高质量中文语音合成与声音克隆服务。

Language: Python - Size: 30.7 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 282 - Forks: 39

hienhayho/rag-colls

Collection of recent advanced RAG techniques.

Language: Python - Size: 10.1 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 12 - Forks: 4

chtmp223/topicGPT

TopicGPT: A Prompt-Based Framework for Topic Modeling (NAACL'24)

Language: Python - Size: 828 KB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 282 - Forks: 46

vllm-project/vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

Language: Python - Size: 911 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 477 - Forks: 87

OpenRLHF/OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)

Language: Python - Size: 2.53 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 6,300 - Forks: 618

hcd233/Aris-AI-Model-Server

An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API

Language: Python - Size: 1.05 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 13 - Forks: 1

jakobdylanc/llmcord

Make Discord your LLM frontend ● Supports any OpenAI compatible API (Ollama, LM Studio, vLLM, OpenRouter, xAI, Mistral, Groq and more)

Language: Python - Size: 168 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 523 - Forks: 103

afhverjuekki/logic-markers-ai

Generate Logic Pro Markers from Video with AI

Language: Python - Size: 144 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

scitix/arks

Arks is a cloud-native inference framework running on Kubernetes

Language: Go - Size: 353 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 4 - Forks: 2

ModelTC/llmc

[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

Language: Python - Size: 28.9 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 456 - Forks: 53

ModelCloud/GPTQModel

Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

Language: Python - Size: 11.8 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 465 - Forks: 68

xorbitsai/inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

Language: Python - Size: 44.6 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 7,500 - Forks: 636

FreeIPCC/LLM-ContactCenter-AI-CallCenter

LLM Call Center,AI Call Center,大模型呼叫中心,大模型客服系统,可以对接市面上主流模型与私有模型:OpenAI,LLaMA,Kimi,通义千问,智谱AI,讯飞星火,Gemini,Xorbits Inference,Amazon Bedrock,火山引擎,腾讯混元,Claude,Bard,DeepSeek,Azure OpenAI,千帆大模型,Ollama,qwen,vLLM

Language: TypeScript - Size: 23.1 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 5 - Forks: 4

mohammad-nour-alawad/Voice-to-Pandas-LLM-backend

FAST API for LLM Inference with Qwen2.5, Whisper AI and Vits TTS

Language: Python - Size: 252 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

NEOS-AI/Neosearch

AI-based search engine done right

Language: HTML - Size: 97.4 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 16 - Forks: 0

IDEA-Research/RexSeek

Referring any person or objects given a natural language description. Code base for RexSeek and HumanRef Benchmark

Language: Python - Size: 9.55 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 112 - Forks: 8

JackYFL/awesome-VLLMs

This repository collects papers on VLLM applications. We will update new papers irregularly.

Size: 893 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 86 - Forks: 8

lamalab-org/macbench

Probing the limitations of multimodal language models for chemistry and materials research

Language: Python - Size: 2.18 GB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 14 - Forks: 0

LLM-inference-router/vllm-router

vLLM Router

Language: Python - Size: 45.9 KB - Last synced at: 2 days ago - Pushed at: about 1 year ago - Stars: 26 - Forks: 1

g-eoj/guided-agents

Use structured output to control agents.

Language: Python - Size: 396 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

vectara/mirage-bench

Repository for Multililngual Generation, RAG evaluations, and surrogate judge training for Arena RAG leaderboard (NAACL'25)

Language: Python - Size: 2.8 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 7 - Forks: 0

huahuadeliaoliao/RoseChat

AI agent with async, multithreading and mcp support

Language: Python - Size: 40 KB - Last synced at: 10 days ago - Pushed at: 11 days ago - Stars: 1 - Forks: 0

FlagOpen/RoboBrain

[CVPR 2025] RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete. Official Repository.

Language: Python - Size: 13.3 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 138 - Forks: 8

shell-nlp/gpt_server

gpt_server是一个用于生产级部署LLMs或Embedding的开源框架。

Language: Python - Size: 2.42 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 165 - Forks: 15

0-mostafa-rezaee-0/Batch_LLM_Inference_with_Ray_Data_LLM

Batch LLM Inference with Ray Data LLM: From Simple to Advanced

Language: Jupyter Notebook - Size: 1.63 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 3 - Forks: 1

lework/llm-benchmark

LLM 并发性能测试工具,支持自动化压力测试和性能报告生成。

Language: Python - Size: 117 KB - Last synced at: 8 days ago - Pushed at: 29 days ago - Stars: 28 - Forks: 6

llmariner/llmariner

Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs.

Language: Go - Size: 7.84 MB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 65 - Forks: 7

wizenheimer/periscope

LLM Performance Testing | K6 + Grafana + InfluxDB | A tiny toolkit for load testing and benchmarking OpenAI-like inference endpoints using K6 + Grafana + InfluxDB

Language: JavaScript - Size: 563 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

xxrjun/local-inference

🐑 Run LLM inference locally for various downstream applications.

Language: Shell - Size: 2.53 MB - Last synced at: 9 days ago - Pushed at: 12 days ago - Stars: 2 - Forks: 0

France-Travail/happy_vllm

A REST API for vLLM, production ready

Language: Python - Size: 855 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 19 - Forks: 2

Svastikkka/HELM

HELM Repository to deploy Services

Language: Smarty - Size: 104 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

kyegomez/SimpleUnet

An simple implementation of Unet because all the implementations i've seen are wayy tooo complicated.

Language: Python - Size: 205 KB - Last synced at: 2 days ago - Pushed at: 5 months ago - Stars: 9 - Forks: 1

argonne-lcf/LLM-Inference-Bench

LLM-Inference-Bench

Language: Jupyter Notebook - Size: 11.2 MB - Last synced at: 12 days ago - Pushed at: 4 months ago - Stars: 39 - Forks: 4

xxxjjhhh/vllm_docker

개발자 유미 : vLLM 도커 환경 배포 스크립트 및 예시 코드

Language: Python - Size: 11.7 KB - Last synced at: 4 days ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

prodesk98/advanced-deep-research

Automated Deep Research with LLMs, web search, paper parsing, and didactic summarization.

Language: Python - Size: 127 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 1 - Forks: 0

moeru-ai/demodel

🚀🛸 Easily boost the speed of pulling your models and datasets from various of inference runtimes. (e.g. 🤗 HuggingFace, 🐫 Ollama, vLLM, and more!)

Language: Rust - Size: 47.9 KB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

yoziru/nextjs-vllm-ui

Fully-featured, beautiful web interface for vLLM - built with NextJS.

Language: TypeScript - Size: 6.2 MB - Last synced at: 16 days ago - Pushed at: 28 days ago - Stars: 118 - Forks: 19

prometheus-eval/prometheus-eval

Evaluate your LLM's response with Prometheus and GPT4 💯

Language: Python - Size: 15 MB - Last synced at: 17 days ago - Pushed at: about 1 month ago - Stars: 898 - Forks: 55

samzong/fastllm

A minimal LLM server launcher in just ~100 lines of Python code.

Language: Python - Size: 23.4 KB - Last synced at: 8 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

yahavb/coldstart-recs-on-aws-trainium Fork of aws-samples/eks_gpu_and_trainuim_perceiver_io_training

End-to-end solution for cold-start recommendations using vLLM, DeepSeek Llama (8B & 70B), and FAISS on AWS Trainium (Trn1) with the Neuron SDK and NeuronX Distributed. Includes LLM-based interest expansion, embedding comparisons (T5 & SentenceTransformers), and scalable retrieval workflows.

Language: Python - Size: 1.08 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

vam876/LocalAPI.AI

LocalAPI.AI is a local AI management tool for Ollama, offering Web UI management and compatibility with vLLM, LM Studio, llama.cpp, Mozilla-Llamafile, Jan Al, Cortex API, Local-LLM, LiteLLM, GPT4All, and more.

Language: HTML - Size: 1.25 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 17 - Forks: 0

NetEase-Media/grps

Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offering scalability, extensibility, and high performance. It helps users quickly deploy models and provide services through HTTP/RPC interfaces.

Language: C++ - Size: 67.8 MB - Last synced at: 17 days ago - Pushed at: 27 days ago - Stars: 157 - Forks: 13

Getty/langertha

Perl Framework for AI - Langertha - the viking of AI

Language: Perl - Size: 325 KB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 3 - Forks: 0

YY0649/ICE-PIXIU

ICE-PIXIU:A Cross-Language Financial Megamodeling Framework

Language: Python - Size: 118 MB - Last synced at: 2 days ago - Pushed at: 5 months ago - Stars: 16 - Forks: 0

varunshenoy/super-json-mode

Low latency JSON generation using LLMs ⚡️

Language: Jupyter Notebook - Size: 652 KB - Last synced at: 17 days ago - Pushed at: about 1 year ago - Stars: 398 - Forks: 14

France-Travail/benchmark_llm_serving

A library to benchmark LLMs via their API exposure

Language: Python - Size: 8.04 MB - Last synced at: 22 days ago - Pushed at: 4 months ago - Stars: 6 - Forks: 0

brokedba/vllm-lab

This Repository contains terraform configuration for vllm production-stack in the cloud managed K8s

Size: 25.4 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

OpenCSGs/llm-inference

llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.

Language: Python - Size: 602 KB - Last synced at: 10 days ago - Pushed at: 11 months ago - Stars: 80 - Forks: 16

Joshue2006/LLM-Reasoner

Make any LLM to think like OpenAI o1 and deepseek R1

Size: 1.95 KB - Last synced at: 23 days ago - Pushed at: 29 days ago - Stars: 3 - Forks: 0

Murtaza-arif/RAG-Agnostic-Guide

A comprehensive guide to building Retrieval-Augmented Generation (RAG) systems using various open-source tools.

Language: HTML - Size: 19.6 MB - Last synced at: 22 days ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

gotzmann/booster

Booster - open accelerator for LLM models. Better inference and debugging for AI hackers

Language: C++ - Size: 144 MB - Last synced at: 2 days ago - Pushed at: 8 months ago - Stars: 154 - Forks: 7

Followb1ind1y/Medical-LLM-Fine-tuning

Fine-tunes LLaMA-3-8B on PubMedQA with QLoRA, optimized via DeepSpeed and vLLM for efficient, low-latency medical QA. Deployable via Docker for scalable clinical inference.

Language: Jupyter Notebook - Size: 3.38 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

mahshid1378/Project-vLLM

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)

Language: Python - Size: 402 KB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

iNeil77/vllm-code-harness

Run code inference-only benchmarks quickly using vLLM

Language: Python - Size: 814 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 11 - Forks: 0

mahshid1378/Worker-vLLM

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

Language: Python - Size: 26 MB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

mahshid1378/production-stack

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Language: Python - Size: 1.85 MB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

mahshid1378/Sparrow-vLLM

Data processing with ML, LLM and Vision LLM

Language: Python - Size: 5.3 MB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

mahshid1378/Overall-Model-New-LLM

Im working overall model LLM, vLLM, LCM and you see the on Repositories the component model.

Language: Jupyter Notebook - Size: 141 MB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

jparkerweb/down-craft

📑 npm pacakge to Craft files into Markdown with ease

Language: JavaScript - Size: 17.4 MB - Last synced at: 18 days ago - Pushed at: 4 months ago - Stars: 9 - Forks: 1

itelnov/skeernir

UI to deploy locally agents and customise interaction with them

Language: Python - Size: 13.8 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 11 - Forks: 2

bluechanel/deploy_llm

Rapid Deployment of LLM and Embedding Based on VLLM Using Docker

Language: Python - Size: 375 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 6 - Forks: 1

su-shubham/Duuq

[WIP] Real-Time Harmful Detection Pipeline Using vLLM

Language: TypeScript - Size: 202 KB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

KevinLee1110/dynamic-batching

The official repo for the paper "Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching"

Size: 11.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 8 - Forks: 1

oogou11/AgriBrain

AgriBrain is a cloud-native smart AI agriculture platform based on Spring AI Alibaba、 LLM, JDK 17, and Alibaba Cloud ACK (Container Service)

Size: 3.91 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

tommathewXC/lidia

A fully customizable, super light-weight, cross-platform GenAI based Personal Assistant that can be run locally on your private hardware!

Language: Python - Size: 21.5 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

intelligentnode/IntelliChat

Modern AI chatbot supporting multiple LLMs. Switch between Gemini, Mistral, Llama, Claude and ChatGPT.

Language: TypeScript - Size: 20.8 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 54 - Forks: 17

itsvaibhav01/Immune

[CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment

Language: Python - Size: 2.77 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 5 - Forks: 0

muziyongshixin/SimpleAIGateway

This is a simple AI gateway implemented in Python, which has functions such as load balancing, error alarm, disaster recovery and backup.

Language: Python - Size: 107 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

itsvaibhav01/immune-web

Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment

Language: JavaScript - Size: 97.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

claw1200/llama-cord

Discord App for Interacting with local Ollama Models. Multiple Agents Supported!

Language: Python - Size: 60.5 KB - Last synced at: 2 days ago - Pushed at: 2 months ago - Stars: 3 - Forks: 0

ivangabriele/docker-llm

Pre-loaded LLMs served as an OpenAI-Compatible API via Docker images.

Language: Dockerfile - Size: 199 KB - Last synced at: 20 days ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 2

lucasjinreal/Namo-R1

A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.

Language: Python - Size: 1.13 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 129 - Forks: 14

kingabzpro/Deploying-Llama-3.3-70B

Serve Llama 3.3 70B (with AWQ quantization) using vLLM and deploy it on BentoCloud.

Language: Python - Size: 9.77 KB - Last synced at: 29 days ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

Trainy-ai/llm-atc 📦

Fine-tuning and serving LLMs on any cloud

Language: Python - Size: 1.71 MB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 89 - Forks: 2

ALucek/ppt2desc

Convert PowerPoint files into semantically rich text using vision language models

Language: Python - Size: 1.42 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 84 - Forks: 7

melodydepok/llama-cord

Discord App for Interacting with local Ollama Models. Multiple Agents Supported!

Size: 1000 Bytes - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

yueying-teng/generate-language-image-instruction-following-data

Mistral assisted visual instruction data generation by following LLaVA

Language: Python - Size: 69.3 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 1

varunvasudeva1/llm-server-docs

Documentation on setting up an LLM server on Debian from scratch, using Ollama/vLLM, Open WebUI, OpenedAI Speech/Kokoro FastAPI, and ComfyUI.

Size: 31.3 KB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 305 - Forks: 25