GitHub topics: llm-inference
pallma-ai/pallma-guard
The Security Observability Layer for AI Agents
Language: Python - Size: 208 KB - Last synced at: 16 minutes ago - Pushed at: about 2 hours ago - Stars: 0 - Forks: 0

ckt1031/toupie
“Tube” your LLMs into single one single OpenAI compatible API, easy and fast.
Language: TypeScript - Size: 434 KB - Last synced at: 16 minutes ago - Pushed at: about 2 hours ago - Stars: 1 - Forks: 0

sgl-project/ome
OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)
Language: Go - Size: 11.3 MB - Last synced at: about 2 hours ago - Pushed at: about 4 hours ago - Stars: 258 - Forks: 45

Jannchie/chat-ui
Jannchie‘s Chat UI, for LLM Services
Language: Vue - Size: 9.57 MB - Last synced at: about 3 hours ago - Pushed at: about 5 hours ago - Stars: 12 - Forks: 4

Kyaw-Min-Thant/plux
Plux: AI-powered filetree that lets you grab files with one click and save insights in a built-in notepad. Reduce copy-paste friction, boost productivity. 🐙
Language: TypeScript - Size: 790 KB - Last synced at: about 3 hours ago - Pushed at: about 5 hours ago - Stars: 0 - Forks: 0

Fahadfk/AI_deployment
Comprehensive guide to FastAPI, Pydantic, and SQLAlchemy for AI engineers. Learn API design, validation, and ORM workflows with practical examples and setup 🐙
Size: 29.3 KB - Last synced at: about 3 hours ago - Pushed at: about 5 hours ago - Stars: 0 - Forks: 0

SoumyadipRoy17/Medical-Assitant-RAG
A medical assistant RAG chatbot
Language: Python - Size: 587 KB - Last synced at: about 4 hours ago - Pushed at: about 6 hours ago - Stars: 0 - Forks: 0

HuuVuong0912/rag-llm-based-recommender
Explore a smarter way to shop online with this full-stack project built on the infrastructure of Google Cloud Platform (GCP) for RAG based e-commerce with LLM.
Language: TypeScript - Size: 4.2 MB - Last synced at: about 4 hours ago - Pushed at: about 6 hours ago - Stars: 3 - Forks: 1

lolvr69/LLMs-from-scratch
LLMs-from-scratch中文版本,从头开始用 PyTorch 实现一个类似 ChatGPT 的大语言模型(LLM)
Size: 1.95 KB - Last synced at: about 4 hours ago - Pushed at: about 6 hours ago - Stars: 0 - Forks: 0

xlite-dev/Awesome-LLM-Inference
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Language: Python - Size: 115 MB - Last synced at: about 5 hours ago - Pushed at: 19 days ago - Stars: 4,465 - Forks: 304

Elhdad123/llms-on-supercomputers
🖥️ Explore Jupyter Notebooks for mastering large language models on supercomputers, with resources from leading experts in the field.
Language: Jupyter Notebook - Size: 23.7 MB - Last synced at: about 6 hours ago - Pushed at: about 8 hours ago - Stars: 0 - Forks: 0

ErosEXE0/latentmemory
Latent Memory is a Module for Large Language Models that seek to integrate a vector-based memory system into the inference process, leveraging embeddings to capture deeper semantic meaning.
Language: Python - Size: 32.2 KB - Last synced at: about 7 hours ago - Pushed at: about 9 hours ago - Stars: 2 - Forks: 1

viniViado/LLMSecOps
LLMSecOps focuses on integrating security practices within the lifecycle of machine learning models. It ensures that models are robust against threats while maintaining compliance and performance standards.
Size: 461 KB - Last synced at: about 7 hours ago - Pushed at: about 9 hours ago - Stars: 0 - Forks: 0

Delxrius/MiniMax-01
MiniMax-01 is a simple implementation of the MiniMax algorithm, a widely used strategy for decision-making in two-player turn-based games like Tic-Tac-Toe. The algorithm aims to minimize the maximum possible loss for the player, making it a popular choice for developing AI opponents in various game scenarios.
Size: 1000 Bytes - Last synced at: about 8 hours ago - Pushed at: about 10 hours ago - Stars: 5 - Forks: 0

Kira94-hkz/PowerServe
High-speed and easy-use LLM serving framework for local deployment
Size: 1000 Bytes - Last synced at: about 8 hours ago - Pushed at: about 10 hours ago - Stars: 0 - Forks: 0

REZ0AN/WeatherBot
A simple integration of the Weather API with Gemini to demonstrate how LLMs can automatically perform tasks using external tools. The bot leverages chat history to maintain context, ensuring it doesn’t repeat the same tasks unnecessarily.
Language: JavaScript - Size: 1.95 KB - Last synced at: about 11 hours ago - Pushed at: about 13 hours ago - Stars: 0 - Forks: 0

Lightning-AI/litgpt
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Language: Python - Size: 5.54 MB - Last synced at: about 17 hours ago - Pushed at: 1 day ago - Stars: 12,721 - Forks: 1,318

yassa9/qwen600
Static single batch CUDA-only qwen3-0.6B mini inference engine
Language: Cuda - Size: 112 KB - Last synced at: about 17 hours ago - Pushed at: about 19 hours ago - Stars: 0 - Forks: 0

mmoha15/lille
🚀 Build and explore the Lille 130M language model, a compact yet powerful tool for deep learning, featuring an open-source framework and efficient training methods.
Language: Python - Size: 1.68 MB - Last synced at: about 18 hours ago - Pushed at: about 21 hours ago - Stars: 1 - Forks: 0

lemonade-sdk/lemonade
Lemonade helps users run local LLMs with the highest performance by configuring state-of-the-art inference engines for their NPUs and GPUs. Join our discord: https://discord.gg/5xXzkMu8Zk
Language: Python - Size: 4.16 MB - Last synced at: about 22 hours ago - Pushed at: 1 day ago - Stars: 1,202 - Forks: 89

sam-k0/ExamGen
Generate exam questions based on slides, notes or other PDFs
Language: Python - Size: 125 KB - Last synced at: about 23 hours ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

FellouAI/eko
Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai
Language: TypeScript - Size: 1.01 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 4,466 - Forks: 376

marcosomma/orka-reasoning
Orchestrator Kit for Agentic Reasoning - OrKa is a modular AI orchestration system that transforms Large Language Models (LLMs) into composable agents capable of reasoning, fact-checking, and constructing answers with transparent traceability.
Language: Python - Size: 48 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 23 - Forks: 3

katanemo/archgw
The smart edge and AI gateway for agents. Arch is a high-performance proxy server that handles the low-level work in building agents: like applying guardrails, routing prompts to the right agent, and unifying access to LLMs, etc. Natively designed to process prompts, it's framework-agnostic and helps you build agents faster.
Language: Rust - Size: 23.3 MB - Last synced at: about 23 hours ago - Pushed at: 1 day ago - Stars: 3,597 - Forks: 199

harleyszhang/llm_note
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
Language: Python - Size: 185 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 817 - Forks: 88

iPieter/llmq
A Scheduler for Batched LLM Inference
Language: Python - Size: 1.57 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 6 - Forks: 0

lennor-tan/openrouter-free-model
🌐 Explore and manage free models on OpenRouter effortlessly with our web app, featuring browsing, filtering, and multi-language support.
Language: TypeScript - Size: 2.03 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

felladrin/awesome-ai-web-search
List of software that allows searching the web with the assistance of AI: https://hf.co/spaces/felladrin/awesome-ai-web-search
Language: HTML - Size: 67.4 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1,025 - Forks: 76

AryanKarumuri/Gen-AI-Projects
This repository features a collection of generative AI applications designed to showcase the capabilities of AI across various domains.
Language: Jupyter Notebook - Size: 3.61 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 6 - Forks: 0

cactus-compute/cactus
Run AI locally on phones, wearables and AI-native hardware
Language: C++ - Size: 1.97 GB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 2,968 - Forks: 177

sophgo/LLM-TPU
Run generative AI models in sophgo BM1684X/BM1688
Language: C++ - Size: 299 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 238 - Forks: 40

superduper-io/superduper
Superduper: End-to-end framework for building custom AI applications and agents.
Language: Python - Size: 73.8 MB - Last synced at: 1 day ago - Pushed at: 5 days ago - Stars: 5,173 - Forks: 520

inboxpraveen/LLM-Minutes-of-Meeting
🎤📄 An innovative tool that transforms audio or video files into text transcripts and generates concise meeting minutes. Stay organized and efficient in your meetings, and get ready for Phase 2 where we'll be open for contributions to enable real-time meeting transcription! 🚀
Language: Python - Size: 7.14 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 148 - Forks: 14

kishoretvk/jsonAI
jsonAI, a open-source Python library designed to address the common challenge of forcing Large Language Models (LLMs) to produce reliable, structured data. The library ensures syntactically correct output by intelligently guiding the LLM to generate only content tokens, while programmatically handling the structural elements of the desired format.
Language: Python - Size: 1.1 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

mistralai/mistral-inference
Official inference library for Mistral models
Language: Jupyter Notebook - Size: 550 KB - Last synced at: 1 day ago - Pushed at: 6 months ago - Stars: 10,450 - Forks: 959

NVIDIA/GenerativeAIExamples
Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
Language: Jupyter Notebook - Size: 95.4 MB - Last synced at: 1 day ago - Pushed at: 10 days ago - Stars: 3,386 - Forks: 827

gemma-facet/cloud-services
Open-source no-code platform-as-a-service to fine tune, evaluate, and ship customized Gemma VLM/SLM.
Language: Python - Size: 1.7 MB - Last synced at: about 23 hours ago - Pushed at: 1 day ago - Stars: 4 - Forks: 1

Mobile-Artificial-Intelligence/llama_sdk
lcpp is a dart implementation of llama.cpp used by the mobile artificial intelligence distribution (maid)
Language: C++ - Size: 1.78 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 101 - Forks: 23

vitalops/datatune
Perform transformations on your data with natural language using LLMs
Language: Python - Size: 1.87 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 101 - Forks: 8

dalisoft/awesome-hosting
List of awesome hosting sorted by minimal plan price
Size: 198 KB - Last synced at: about 23 hours ago - Pushed at: 7 days ago - Stars: 692 - Forks: 79

run-ai/genv
GPU environment and cluster management with LLM support
Language: Python - Size: 9.41 MB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 635 - Forks: 38

kserve/kserve
Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes
Language: Python - Size: 430 MB - Last synced at: 2 days ago - Pushed at: 7 days ago - Stars: 4,508 - Forks: 1,242

codelion/optillm
Optimizing inference proxy for LLMs
Language: Python - Size: 2.5 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 2,825 - Forks: 215

jd-opensource/xllm
A high-performance inference engine for LLMs, optimized for diverse AI accelerators.
Language: C++ - Size: 4 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 204 - Forks: 38

ray-project/ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Language: Python - Size: 561 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 38,771 - Forks: 6,759

EfficientMoE/MoE-Infinity
PyTorch library for cost-effective, fast and easy serving of MoE models.
Language: Python - Size: 529 KB - Last synced at: 1 day ago - Pushed at: 2 months ago - Stars: 230 - Forks: 17

Mote-Software/nanocoder
A beautiful local-first coding agent running in your terminal - built by the community for the community ⚒
Language: TypeScript - Size: 1.99 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 233 - Forks: 23

rishikksh20/qwen3-playground
Readable implementation of Qwen3 0.6B model
Language: Python - Size: 21.5 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

webgptorg/promptbook
It's time for a paradigm shift! The future of software is in plain English ✨
Language: TypeScript - Size: 245 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 126 - Forks: 14

vienneraphael/batchling
batchling is the universal Python GenAI Batch API client. Create, manage and run experiments on any batch-compatible provider.
Language: Python - Size: 1.52 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 4 - Forks: 0

tingaicompass/AI-Compass
“AI-Compass”将为社区指引在 AI 技术海洋中航行的方向,无论你是初学者还是进阶开发者,都能在这里找到通往 AI 各大方向的路径。旨在帮助开发者系统性地了解 AI 的核心概念、主流技术、前沿趋势,并通过实践掌握从理论到落地的全过程。
Size: 20.1 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 208 - Forks: 20

felladrin/MiniSearch
Minimalist web-searching platform with an AI assistant that runs directly from your browser. Uses WebLLM, Wllama and SearXNG. Demo: https://felladrin-minisearch.hf.space
Language: TypeScript - Size: 27.9 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 475 - Forks: 51

harleyszhang/lite_llama
A light llama-like llm inference framework based on the triton kernel.
Language: Python - Size: 39.4 MB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 150 - Forks: 20

neuron-core/deep-research-agent
Deep research agent built with Neuron PHP framewokrk
Language: PHP - Size: 270 KB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

neuron-core/travel-planner-agent
Travel planner agent built with Neuron PHP framework
Language: PHP - Size: 173 KB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

deepsai8/moe_llama
configurable moe-llama model training and inference built on pytorch
Language: Python - Size: 1.94 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

jax-ml/jax-llm-examples
Minimal yet performant LLM examples in pure JAX
Language: Python - Size: 2.56 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 151 - Forks: 20

Write-with-LAIKA/drama-engine
A Framework for Narrative Agents
Language: TypeScript - Size: 865 KB - Last synced at: 3 days ago - Pushed at: 12 months ago - Stars: 35 - Forks: 9

theankitdash/Personal-Chatbot-Deva-AI-Buddy-Companion
Built an AI Chatbot using Mistral-7B and Gradio with LangChain-ready architecture, supporting 1000+ contextual conversations, emotion-aware responses, and designed for long-term personalized user engagement.
Language: Python - Size: 147 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

FasterDecoding/Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Language: Jupyter Notebook - Size: 4.76 MB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 2,607 - Forks: 180

NotPunchnox/rkllama
Ollama alternative for Rockchip NPU: An efficient solution for running AI and Deep learning models on Rockchip devices with optimized NPU support ( rkllm )
Language: Python - Size: 12.5 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 273 - Forks: 39

qubasehq/LLMBuilder
LLMBuilder is a production-ready framework for training and fine-tuning Large Language Models (LLMs) — not a model itself. Designed for developers, researchers, and AI engineers, LLMBuilder provides a full pipeline to go from raw text data to deployable, optimized LLMs, all running locally on CPUs or GPUs.
Language: Python - Size: 18.7 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

cactus-compute/cactus-react
Cactus React Native package: Run AI locally in your React Native apps
Language: TypeScript - Size: 801 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

structuredllm/syncode
Efficient and general syntactical decoding for Large Language Models
Language: Python - Size: 55.4 MB - Last synced at: about 12 hours ago - Pushed at: 5 days ago - Stars: 288 - Forks: 31

AI-Hypercomputer/JetStream
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
Language: Python - Size: 6.32 MB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 374 - Forks: 51

character-ai/prompt-poet
Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.
Language: Python - Size: 578 KB - Last synced at: 2 days ago - Pushed at: about 2 months ago - Stars: 1,104 - Forks: 93

rejunity/tiny-asic-1_58bit-matrix-mul
Tiny ASIC implementation for "The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits" matrix multiplication unit
Language: Verilog - Size: 9.15 MB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 159 - Forks: 11

Nikityyy/lille
A powerful 130-million-parameter model trained from scratch as part of a truly open-source stack, including a custom tokenizer, dataset, and optimizer.
Language: Python - Size: 405 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 25 - Forks: 0

beam-cloud/beta9
Secure, high-performance AI infrastructure in Python.
Language: Go - Size: 23 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,271 - Forks: 113

b4rtaz/distributed-llama
Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.
Language: C++ - Size: 3.33 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 2,360 - Forks: 164

julep-ai/steadytext
Deterministic text generation and embeddings with zero configuration
Language: PLpgSQL - Size: 9.75 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 16 - Forks: 0

sauravpanda/BrowserAI
Run local LLMs like llama, deepseek-distill, kokoro and more inside your browser
Language: TypeScript - Size: 293 MB - Last synced at: 5 days ago - Pushed at: 15 days ago - Stars: 1,207 - Forks: 106

bentoml/OpenLLM
Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.
Language: Python - Size: 41.1 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 11,738 - Forks: 763

CommanderLake/LMStud
Chat with GGUF LLMs using llama.cpp and a classic Windows Forms interface for minimal GUI bloat.
Language: C# - Size: 1.74 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 4 - Forks: 0

interestingLSY/swiftLLM
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).
Language: Python - Size: 234 KB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 245 - Forks: 28

SJTU-IPADS/PowerInfer
High-speed Large Language Model Serving for Local Deployment
Language: C++ - Size: 21.7 MB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 8,319 - Forks: 443

feifeibear/long-context-attention
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
Language: Python - Size: 4.61 MB - Last synced at: 3 days ago - Pushed at: about 2 months ago - Stars: 556 - Forks: 64

harleyszhang/harleyszhang.github.io Fork of tw93/tw93.github.io
🧗♂️ harleyszhang 的个人博客
Language: HTML - Size: 482 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 5 - Forks: 0

edge-inference/edgereasoning
Optimizing Reasoning LLM Deployment on Edge GPUs
Language: Python - Size: 365 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

predibase/lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Language: Python - Size: 6.62 MB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 3,401 - Forks: 262

AnswerDotAI/cold-compress
Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.
Language: Python - Size: 8.33 MB - Last synced at: 1 day ago - Pushed at: about 1 year ago - Stars: 144 - Forks: 14

LLM-inference-router/vllm-router
vLLM Router
Language: Python - Size: 45.9 KB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 41 - Forks: 2

bentoml/BentoML
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
Language: Python - Size: 98.3 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 8,028 - Forks: 871

little51/llm-dev
《大模型项目实战:多领域智能应用开发》配套资源
Language: JavaScript - Size: 2.39 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 168 - Forks: 32

bentoml/llm-inference-handbook
Everything you need to know about LLM inference
Language: TypeScript - Size: 10.7 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 223 - Forks: 21

arielfayol37/DeathNote
A multimodal-LLM-powered React-Native mobile app used to take notes.
Language: JavaScript - Size: 117 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

NPC-Worldwide/npc-studio
the IDE for research, built from the ground up with AI integrations
Language: JavaScript - Size: 12.1 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 75 - Forks: 6

taielab/awesome-hacking-lists
A curated collection of top-tier penetration testing tools and productivity utilities across multiple domains. Join us to explore, contribute, and enhance your hacking toolkit!
Size: 7.2 MB - Last synced at: 5 days ago - Pushed at: 16 days ago - Stars: 1,176 - Forks: 235

morpheuslord/HackBot
AI-powered cybersecurity chatbot designed to provide helpful and accurate answers to your cybersecurity-related queries and also do code analysis and scan analysis.
Language: Python - Size: 56.6 KB - Last synced at: 5 days ago - Pushed at: 10 months ago - Stars: 314 - Forks: 54

InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Language: Python - Size: 9.01 MB - Last synced at: 6 days ago - Pushed at: 8 days ago - Stars: 6,942 - Forks: 599

andrew264/modelex
Doing devious stuff with AI
Language: Python - Size: 429 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2 - Forks: 0

alemoraru/exceed-project-validation
EXCEED Project Validation
Language: TypeScript - Size: 464 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

expectedparrot/edsl
Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.
Language: Python - Size: 127 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 268 - Forks: 25

Anish-CodeDev/Desktop_AI_Agent
AI-powered desktop assistant for secure file management, document creation, and workflow automation.
Language: Python - Size: 362 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

nomic-ai/gpt4all
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
Language: C++ - Size: 42.6 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 76,581 - Forks: 8,261

flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
Language: Cuda - Size: 6.42 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 3,656 - Forks: 470

NikolasEnt/ollama-webui-intel
Ollama with intel (i)GPU acceleration in docker and benchmark
Language: Python - Size: 1.54 MB - Last synced at: 6 days ago - Pushed at: 9 days ago - Stars: 21 - Forks: 5

inspector-apm/neuron-ai
The PHP Agent Development Kit to build production-ready Agentic applications. Connect components (LLMs, vector DBs, memory) to agents that can interact with your data. With its modular architecture it's best suited for building RAG, question answering, or business process automations.
Language: PHP - Size: 17.5 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 965 - Forks: 102

bd4sur/Nano
电子鹦鹉 / Toy Language Model
Language: C - Size: 30.5 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 195 - Forks: 11

MCERQUA/LLM-Runner-Router
LLM-Runner-Router is not just another model loader - it's a full-stack agnostic neural orchestration system that adapts to ANY model format, ANY runtime environment, and ANY deployment scenario. Think of it as the Swiss Army knife of AI inference, but cooler and with more quantum entanglement.
Language: HTML - Size: 4.59 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 1
