An open API service providing repository metadata for many open source software ecosystems.

Topic: "vllm"

meta-llama/llama-cookbook

Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model family and using them on various provider services

Language: Jupyter Notebook - Size: 214 MB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 17,273 - Forks: 2,476

xorbitsai/inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

Language: Python - Size: 44.9 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 7,804 - Forks: 665

OpenRLHF/OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & LoRA & vLLM & RFT)

Language: Python - Size: 2.54 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 6,674 - Forks: 651

katanaml/sparrow

Data processing and instruction calling with ML, LLM and Vision LLM

Language: Python - Size: 11.3 MB - Last synced at: 3 days ago - Pushed at: 7 days ago - Stars: 4,523 - Forks: 460

xlite-dev/Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc.

Language: Python - Size: 115 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 3,982 - Forks: 277

gpustack/gpustack

Manage GPU clusters for running AI models

Language: Python - Size: 95.2 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 2,680 - Forks: 266

containers/ramalama

Ramalama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

Language: Python - Size: 2.64 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,600 - Forks: 183

bricks-cloud/BricksLLM

🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI, Azure OpenAI, Anthropic, vLLM, and open-source LLMs.

Language: Go - Size: 3.89 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 955 - Forks: 66

substratusai/kubeai

AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.

Language: Go - Size: 15.9 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 924 - Forks: 84

prometheus-eval/prometheus-eval

Evaluate your LLM's response with Prometheus and GPT4 💯

Language: Python - Size: 15 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 915 - Forks: 55

harleyszhang/llm_note

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

Language: Python - Size: 177 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 765 - Forks: 78

mostlygeek/llama-swap

Model swapping for llama.cpp (or any local OpenAPI compatible server)

Language: Go - Size: 907 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 738 - Forks: 38

vllm-project/vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

Language: Python - Size: 1.28 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 603 - Forks: 130

pgalko/BambooAI

A Python library powered by Language Models (LLMs) for conversational data discovery and analysis.

Language: Python - Size: 7.65 MB - Last synced at: about 2 hours ago - Pushed at: 2 days ago - Stars: 581 - Forks: 58

apconw/sanic-web

一个轻量级、支持全链路且易于二次开发的大模型应用项目(Large Model Data Assistant) 支持DeepSeek/Qwen2.5等大模型 基于 Dify 、Ollama&Vllm、Sanic 和 Text2SQL 📊 等技术构建的一站式大模型应用开发项目,采用 Vue3、TypeScript 和 Vite 5 打造现代UI。它支持通过 ECharts 📈 实现基于大模型的数据图形化问答,具备处理 CSV 文件 📂 表格问答的能力。同时,能方便对接第三方开源 RAG 系统 检索系统 🌐等,以支持广泛的通用知识问答。

Language: JavaScript - Size: 144 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 571 - Forks: 99

jakobdylanc/llmcord

Make Discord your LLM frontend ● Supports any OpenAI compatible API (Ollama, LM Studio, vLLM, OpenRouter, xAI, Mistral, Groq and more)

Language: Python - Size: 159 KB - Last synced at: about 3 hours ago - Pushed at: 2 days ago - Stars: 553 - Forks: 117

ModelCloud/GPTQModel

Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

Language: Python - Size: 12 MB - Last synced at: 3 days ago - Pushed at: 7 days ago - Stars: 540 - Forks: 77

ModelTC/llmc

[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

Language: Python - Size: 28.9 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 470 - Forks: 53

varunvasudeva1/llm-server-docs

Documentation on setting up an LLM server on Debian from scratch, using Ollama/vLLM, Open WebUI, OpenedAI Speech/Kokoro FastAPI, and ComfyUI.

Size: 32.2 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 435 - Forks: 37

varunshenoy/super-json-mode

Low latency JSON generation using LLMs ⚡️

Language: Jupyter Notebook - Size: 652 KB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 400 - Forks: 14

HuiResearch/FlashTTS

基于SparkTTS、OrpheusTTS等模型,提供高质量中文语音合成与声音克隆服务。

Language: Python - Size: 31.1 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 383 - Forks: 53

microsoft/vidur

A large-scale simulation framework for LLM inference

Language: Python - Size: 156 MB - Last synced at: about 19 hours ago - Pushed at: 6 months ago - Stars: 374 - Forks: 65

runpod-workers/worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

Language: Python - Size: 26.4 MB - Last synced at: about 18 hours ago - Pushed at: about 19 hours ago - Stars: 314 - Forks: 153

chtmp223/topicGPT

TopicGPT: A Prompt-Based Framework for Topic Modeling (NAACL'24)

Language: Python - Size: 828 KB - Last synced at: 6 days ago - Pushed at: 2 months ago - Stars: 290 - Forks: 47

jasonacox/TinyLLM

Setup and run a local LLM and Chatbot using consumer grade hardware.

Language: JavaScript - Size: 493 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 245 - Forks: 28

FlagOpen/RoboBrain

[CVPR 2025] RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete. Official Repository.

Language: Python - Size: 13.2 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 189 - Forks: 10

lucasjinreal/Namo-R1

A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.

Language: Python - Size: 1.13 MB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 185 - Forks: 17

shell-nlp/gpt_server

gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR和TTS的开源框架。

Language: Python - Size: 4.67 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 173 - Forks: 16

InftyAI/llmaz

☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!

Language: Go - Size: 9.88 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 165 - Forks: 27

NetEase-Media/grps

Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offering scalability, extensibility, and high performance. It helps users quickly deploy models and provide services through HTTP/RPC interfaces.

Language: C++ - Size: 67.8 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 157 - Forks: 13

gotzmann/booster

Booster - open accelerator for LLM models. Better inference and debugging for AI hackers

Language: C++ - Size: 144 MB - Last synced at: 5 days ago - Pushed at: 9 months ago - Stars: 155 - Forks: 7

yoziru/nextjs-vllm-ui

Fully-featured, beautiful web interface for vLLM - built with NextJS.

Language: TypeScript - Size: 6.07 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 135 - Forks: 18

JackYFL/awesome-VLLMs

This repository collects papers on VLLM applications. We will update new papers irregularly.

Size: 939 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 119 - Forks: 12

IDEA-Research/RexSeek

Referring any person or objects given a natural language description. Code base for RexSeek and HumanRef Benchmark

Language: Python - Size: 9.55 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 112 - Forks: 8

nbasyl/DoRA 📦

Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"

Size: 557 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 104 - Forks: 2

Trainy-ai/llm-atc 📦

Fine-tuning and serving LLMs on any cloud

Language: Python - Size: 1.71 MB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 89 - Forks: 2

ALucek/ppt2desc

Convert PowerPoint files into semantically rich text using vision language models

Language: Python - Size: 1.42 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 84 - Forks: 7

OpenCSGs/llm-inference

llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.

Language: Python - Size: 602 KB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 80 - Forks: 16

llmariner/llmariner

Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs.

Language: Go - Size: 7.87 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 67 - Forks: 5

hyperai/vllm-cn

vLLM Documentation in Chinese Simplified / vLLM 中文文档

Language: TypeScript - Size: 7.45 MB - Last synced at: 7 days ago - Pushed at: 22 days ago - Stars: 63 - Forks: 5

VectorInstitute/vector-inference

Efficient LLM inference on Slurm clusters using vLLM.

Language: Python - Size: 2.79 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 59 - Forks: 10

intelligentnode/IntelliChat

Modern AI chatbot supporting multiple LLMs. Switch between Gemini, Mistral, Llama, Claude and ChatGPT.

Language: TypeScript - Size: 20.8 MB - Last synced at: 5 days ago - Pushed at: 2 months ago - Stars: 55 - Forks: 17

aws-samples/easy-model-deployer

A user-friendly Command-line/SDK tool that makes it quickly and easier to deploy open-source LLMs on AWS

Language: Python - Size: 45 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 42 - Forks: 7

argonne-lcf/LLM-Inference-Bench

LLM-Inference-Bench

Language: Jupyter Notebook - Size: 11.2 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 39 - Forks: 4

jeremyarancio/VLM-Batch-Deployment

Batch Deployment for Document Parsing with AWS Batch & Qwen-2.5-VL

Language: Jupyter Notebook - Size: 398 KB - Last synced at: 8 days ago - Pushed at: 18 days ago - Stars: 36 - Forks: 13

gameofdimension/vllm-cn

演示 vllm 对中文大语言模型的神奇效果

Language: Jupyter Notebook - Size: 152 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 31 - Forks: 1

lework/llm-benchmark

LLM 并发性能测试工具,支持自动化压力测试和性能报告生成。

Language: Python - Size: 117 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 28 - Forks: 6

LLM-inference-router/vllm-router

vLLM Router

Language: Python - Size: 45.9 KB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 28 - Forks: 1

phospho-app/fastassert

Dockerized LLM inference server with constrained output (JSON mode), built on top of vLLM and outlines. Faster, cheaper and without rate limits. Compare the quality and latency to your current LLM API provider.

Language: Jupyter Notebook - Size: 176 KB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 27 - Forks: 0

France-Travail/happy_vllm

A REST API for vLLM, production ready

Language: Python - Size: 859 KB - Last synced at: 17 days ago - Pushed at: 25 days ago - Stars: 20 - Forks: 2

zRzRzRzRzRzRzR/lm-fly

大模型推理框架加速,让 LLM 飞起来

Language: Python - Size: 7.45 MB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 18 - Forks: 4

vam876/LocalAPI.AI

LocalAPI.AI is a local AI management tool for Ollama, offering Web UI management and compatibility with vLLM, LM Studio, llama.cpp, Mozilla-Llamafile, Jan Al, Cortex API, Local-LLM, LiteLLM, GPT4All, and more.

Language: HTML - Size: 1.25 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 17 - Forks: 0

YY0649/ICE-PIXIU

ICE-PIXIU:A Cross-Language Financial Megamodeling Framework

Language: Python - Size: 118 MB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 17 - Forks: 0

NEOS-AI/Neosearch

AI-based search engine done right

Language: TypeScript - Size: 99 MB - Last synced at: about 3 hours ago - Pushed at: about 4 hours ago - Stars: 16 - Forks: 0

scitix/arks

Arks is a cloud-native inference framework running on Kubernetes

Language: Go - Size: 1.78 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 16 - Forks: 4

sherlockchou86/PyLangPipe

a simple lightweight large language model pipeline framework.

Language: Python - Size: 790 KB - Last synced at: 27 days ago - Pushed at: 28 days ago - Stars: 16 - Forks: 2

Climatik-Project/Climatik-Project

Carbon Limiting Auto Tuning for Kubernetes

Language: Go - Size: 411 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 16 - Forks: 7

automatika-robotics/ros-agents

ROS Agents is a fully-loaded framework for creating interactive embodied agents that can understand, remember, and act upon contextual information from their environment.

Language: Python - Size: 3.64 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 15 - Forks: 0

hcd233/Aris-AI-Model-Server

An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API

Language: Python - Size: 1.05 MB - Last synced at: 8 days ago - Pushed at: 28 days ago - Stars: 14 - Forks: 1

lamalab-org/macbench

Probing the limitations of multimodal language models for chemistry and materials research

Language: Python - Size: 2.18 GB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 14 - Forks: 0

hienhayho/rag-colls

Collection of recent advanced RAG techniques.

Language: Python - Size: 10.1 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 12 - Forks: 4

itelnov/skeernir

UI to deploy locally agents and customise interaction with them

Language: Python - Size: 13.8 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 11 - Forks: 2

iNeil77/vllm-code-harness

Run code inference-only benchmarks quickly using vLLM

Language: Python - Size: 814 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 11 - Forks: 0

neuralmagic/nm-vllm-certs

General Information, model certifications, and benchmarks for nm-vllm enterprise distributions

Size: 877 KB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 11 - Forks: 2

joydeb28/llm-lab

LLM, Fine Tuning, Llama 2, Gemma, Mixtral, vLLM, LangChain, RAG, ChromaDB, FAISS

Language: Jupyter Notebook - Size: 123 KB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 11 - Forks: 6

blib-la/ask-poddy

Ask Poddy: Run Open Source LLMs and Embeddings as OpenAI-Compatible Serverless Endpoints (Tutorial)

Language: TypeScript - Size: 7.18 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 10 - Forks: 1

jparkerweb/down-craft

📑 npm pacakge to Craft files into Markdown with ease

Language: JavaScript - Size: 17.4 MB - Last synced at: 12 days ago - Pushed at: 4 months ago - Stars: 9 - Forks: 1

kyegomez/SimpleUnet

An simple implementation of Unet because all the implementations i've seen are wayy tooo complicated.

Language: Python - Size: 205 KB - Last synced at: 8 days ago - Pushed at: 6 months ago - Stars: 9 - Forks: 1

wangcx18/llm-vscode-inference-server

An endpoint server for efficiently serving quantized open-source LLMs for code.

Language: Python - Size: 85 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 1

qizhou000/VisEdit

[AAAI 2025 oral] Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit

Language: Python - Size: 3.46 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 8 - Forks: 0

KevinLee1110/dynamic-batching

The official repo for the paper "Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching"

Size: 11.7 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 8 - Forks: 1

FreeIPCC/LLM-ContactCenter-AI-CallCenter

LLM Call Center,AI Call Center,大模型呼叫中心,大模型客服系统,可以对接市面上主流模型与私有模型:OpenAI,LLaMA,Kimi,通义千问,智谱AI,讯飞星火,Gemini,Xorbits Inference,Amazon Bedrock,火山引擎,腾讯混元,Claude,Bard,DeepSeek,Azure OpenAI,千帆大模型,Ollama,qwen,vLLM

Language: TypeScript - Size: 23.2 MB - Last synced at: about 11 hours ago - Pushed at: about 12 hours ago - Stars: 7 - Forks: 4

vectara/mirage-bench

Repository for Multililngual Generation, RAG evaluations, and surrogate judge training for Arena RAG leaderboard (NAACL'25)

Language: Python - Size: 2.8 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 7 - Forks: 0

Tangkfan/Awesome-Temporal-Video-Grounding

paper list on Video Moment Retrieval (VMR), or Temporal Video Grounding (TVG), Video Grounding (VG), or Temporal Sentence Grounding in Videos (TSGV)

Size: 59.6 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 6 - Forks: 0

bluechanel/deploy_llm

Rapid Deployment of LLM and Embedding Based on VLLM Using Docker

Language: Python - Size: 375 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 6 - Forks: 1

arc53/doc2md

Convert pdf and image files into markdown

Language: TypeScript - Size: 269 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 6 - Forks: 0

France-Travail/benchmark_llm_serving

A library to benchmark LLMs via their API exposure

Language: Python - Size: 8.04 MB - Last synced at: 17 days ago - Pushed at: 5 months ago - Stars: 6 - Forks: 0

NetEase-Media/grps_vllm

【grps接入vllm】通过vllm LLMEngine Api实现LLM服务。

Language: Python - Size: 177 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 6 - Forks: 0

ivangabriele/docker-llm

Pre-loaded LLMs served as an OpenAI-Compatible API via Docker images.

Language: Dockerfile - Size: 199 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 2

itsvaibhav01/Immune

[CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment

Language: Python - Size: 2.77 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 5 - Forks: 0

ivangabriele/docker-functionary

Ready-to-deploy Docker image for Functionary LLM served as an OpenAI-Compatible API.

Language: Dockerfile - Size: 27.3 KB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 5 - Forks: 1

asprenger/ray_vllm_inference

A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.

Language: Python - Size: 81.1 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 1

NJUxlj/Chinese-MedQA-Qwen2

基于Qwen2+SFT+DPO的医疗问答系统,项目中使用了LLaMA-Factory用于训练,fastllm和vllm用于推理,

Language: Python - Size: 523 KB - Last synced at: about 14 hours ago - Pushed at: about 15 hours ago - Stars: 4 - Forks: 0

stackav-oss/conch

A "standard library" of Triton kernels.

Language: Python - Size: 293 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 4 - Forks: 0

aws-samples/multi-modal-examples-for-amazon-sagemaker

A workshop for collections of multi-modal LLM examples, samples, reference architecture and demos on Amazon SageMaker.

Language: Jupyter Notebook - Size: 33.7 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 4 - Forks: 2

moeru-ai/demodel

🚀🛸 Easily boost the speed of pulling your models and datasets from various of inference runtimes. (e.g. 🤗 HuggingFace, 🐫 Ollama, vLLM, and more!)

Language: Rust - Size: 47.9 KB - Last synced at: 4 days ago - Pushed at: 2 months ago - Stars: 4 - Forks: 0

yas-sim/openvino_genai_sample_codes

OpenVINO.genai sample codes with a helper class that supports vLLM-like iterator-based streaming output.

Language: Python - Size: 6.84 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 4 - Forks: 0

esmailza/Llama2-vLLM-LangChain-knowledge-graph

Preserving entities through the integration of knowledge graphs, Llama 2, vLLM, and LangChain.

Language: Python - Size: 763 KB - Last synced at: 5 days ago - Pushed at: 12 months ago - Stars: 4 - Forks: 0

gusanmaz/echosight

EchoSight is a tool that helps visually impaired individuals by audibly describing images taken with a Raspberry Pi Camera or inputted via image path or URL across different operating systems.

Language: Python - Size: 213 KB - Last synced at: 17 days ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 0

cosmic-heart/AI-Learning-Platform

AI-Learning-Platform, a LLM-RAG pipeline which behaves like a guide and able to solve doubts. Deployed on-premise IBM ppc64le architecture. vLLM for model inference & Qdrant with Langchain for RAG Pipeline. Server written in django, postgres & cassandra as the sql & nosql databases.

Language: Python - Size: 1.71 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 3 - Forks: 0

0-mostafa-rezaee-0/Batch_LLM_Inference_with_Ray_Data_LLM

Batch LLM Inference with Ray Data LLM: From Simple to Advanced

Language: Jupyter Notebook - Size: 1.63 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 1

Getty/langertha

Perl Framework for AI - Langertha - the viking of AI

Language: Perl - Size: 326 KB - Last synced at: 23 days ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 0

Joshue2006/LLM-Reasoner

Make any LLM to think like OpenAI o1 and deepseek R1

Size: 1.95 KB - Last synced at: 22 days ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 0

claw1200/llama-cord

Discord App for Interacting with local Ollama Models. Multiple Agents Supported!

Language: Python - Size: 60.5 KB - Last synced at: 26 days ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

lucataco/cog-Hermes-2-Pro-Llama-3-8B

Cog wrapper for NousResearch/Hermes-2-Pro-Llama-3-8B

Language: Python - Size: 3.91 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 1

TimeSurgeLabs/promptproxy

Call many AIs from a single API.

Language: Go - Size: 284 KB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

lklivingstone/sih_2023

A Large Language Model based tool for generating human like responses to natural language inputs for network not connected over internet.

Language: Python - Size: 646 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

Liquid4All/on-prem-stack

Scripts to launch Liquid on-prem stack

Language: Shell - Size: 195 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 2 - Forks: 1

umi-AIGC-saas/umi_ai_cms

双重驱动的智能AI系统,它对接了目前市场上主流的AI大模型,并根据这些大模型的优劣势进行算法分类。通过综合利用各种AI大模型的优势,无忧AI智脑能够提供更准 确、更可靠的信息和解答。

Language: Python - Size: 4.16 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 2 - Forks: 0

xxrjun/local-inference

🐑 Run LLM inference locally for various downstream applications.

Language: Shell - Size: 2.53 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0