GitHub topics: llm-inference

Repositories

pallma-ai/pallma-guard

The Security Observability Layer for AI Agents

Language: Python - Size: 208 KB - Last synced at: 16 minutes ago - Pushed at: about 2 hours ago - Stars: 0 - Forks: 0

ckt1031/toupie

“Tube” your LLMs into single one single OpenAI compatible API, easy and fast.

Language: TypeScript - Size: 434 KB - Last synced at: 16 minutes ago - Pushed at: about 2 hours ago - Stars: 1 - Forks: 0

sgl-project/ome

OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)

Language: Go - Size: 11.3 MB - Last synced at: about 2 hours ago - Pushed at: about 4 hours ago - Stars: 258 - Forks: 45

Jannchie/chat-ui

Jannchie‘s Chat UI, for LLM Services

Language: Vue - Size: 9.57 MB - Last synced at: about 3 hours ago - Pushed at: about 5 hours ago - Stars: 12 - Forks: 4

Kyaw-Min-Thant/plux

Plux: AI-powered filetree that lets you grab files with one click and save insights in a built-in notepad. Reduce copy-paste friction, boost productivity. 🐙

Language: TypeScript - Size: 790 KB - Last synced at: about 3 hours ago - Pushed at: about 5 hours ago - Stars: 0 - Forks: 0

Fahadfk/AI_deployment

Comprehensive guide to FastAPI, Pydantic, and SQLAlchemy for AI engineers. Learn API design, validation, and ORM workflows with practical examples and setup 🐙

Size: 29.3 KB - Last synced at: about 3 hours ago - Pushed at: about 5 hours ago - Stars: 0 - Forks: 0

SoumyadipRoy17/Medical-Assitant-RAG

A medical assistant RAG chatbot

Language: Python - Size: 587 KB - Last synced at: about 4 hours ago - Pushed at: about 6 hours ago - Stars: 0 - Forks: 0

HuuVuong0912/rag-llm-based-recommender

Explore a smarter way to shop online with this full-stack project built on the infrastructure of Google Cloud Platform (GCP) for RAG based e-commerce with LLM.

Language: TypeScript - Size: 4.2 MB - Last synced at: about 4 hours ago - Pushed at: about 6 hours ago - Stars: 3 - Forks: 1

lolvr69/LLMs-from-scratch

LLMs-from-scratch中文版本，从头开始用 PyTorch 实现一个类似 ChatGPT 的大语言模型（LLM）

Size: 1.95 KB - Last synced at: about 4 hours ago - Pushed at: about 6 hours ago - Stars: 0 - Forks: 0

xlite-dev/Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Language: Python - Size: 115 MB - Last synced at: about 5 hours ago - Pushed at: 19 days ago - Stars: 4,465 - Forks: 304

Elhdad123/llms-on-supercomputers

🖥️ Explore Jupyter Notebooks for mastering large language models on supercomputers, with resources from leading experts in the field.

Language: Jupyter Notebook - Size: 23.7 MB - Last synced at: about 6 hours ago - Pushed at: about 8 hours ago - Stars: 0 - Forks: 0

ErosEXE0/latentmemory

Latent Memory is a Module for Large Language Models that seek to integrate a vector-based memory system into the inference process, leveraging embeddings to capture deeper semantic meaning.

Language: Python - Size: 32.2 KB - Last synced at: about 7 hours ago - Pushed at: about 9 hours ago - Stars: 2 - Forks: 1

LLMSecOps focuses on integrating security practices within the lifecycle of machine learning models. It ensures that models are robust against threats while maintaining compliance and performance standards.

Size: 461 KB - Last synced at: about 7 hours ago - Pushed at: about 9 hours ago - Stars: 0 - Forks: 0

Delxrius/MiniMax-01

MiniMax-01 is a simple implementation of the MiniMax algorithm, a widely used strategy for decision-making in two-player turn-based games like Tic-Tac-Toe. The algorithm aims to minimize the maximum possible loss for the player, making it a popular choice for developing AI opponents in various game scenarios.

Size: 1000 Bytes - Last synced at: about 8 hours ago - Pushed at: about 10 hours ago - Stars: 5 - Forks: 0

Kira94-hkz/PowerServe

High-speed and easy-use LLM serving framework for local deployment

Size: 1000 Bytes - Last synced at: about 8 hours ago - Pushed at: about 10 hours ago - Stars: 0 - Forks: 0

REZ0AN/WeatherBot

A simple integration of the Weather API with Gemini to demonstrate how LLMs can automatically perform tasks using external tools. The bot leverages chat history to maintain context, ensuring it doesn’t repeat the same tasks unnecessarily.

Language: JavaScript - Size: 1.95 KB - Last synced at: about 11 hours ago - Pushed at: about 13 hours ago - Stars: 0 - Forks: 0

Lightning-AI/litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Language: Python - Size: 5.54 MB - Last synced at: about 17 hours ago - Pushed at: 1 day ago - Stars: 12,721 - Forks: 1,318

yassa9/qwen600

Static single batch CUDA-only qwen3-0.6B mini inference engine

Language: Cuda - Size: 112 KB - Last synced at: about 17 hours ago - Pushed at: about 19 hours ago - Stars: 0 - Forks: 0

mmoha15/lille

🚀 Build and explore the Lille 130M language model, a compact yet powerful tool for deep learning, featuring an open-source framework and efficient training methods.

Language: Python - Size: 1.68 MB - Last synced at: about 18 hours ago - Pushed at: about 21 hours ago - Stars: 1 - Forks: 0

lemonade-sdk/lemonade

Lemonade helps users run local LLMs with the highest performance by configuring state-of-the-art inference engines for their NPUs and GPUs. Join our discord: https://discord.gg/5xXzkMu8Zk

Language: Python - Size: 4.16 MB - Last synced at: about 22 hours ago - Pushed at: 1 day ago - Stars: 1,202 - Forks: 89

sam-k0/ExamGen

Generate exam questions based on slides, notes or other PDFs

Language: Python - Size: 125 KB - Last synced at: about 23 hours ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

FellouAI/eko

Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai

Language: TypeScript - Size: 1.01 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 4,466 - Forks: 376

marcosomma/orka-reasoning

Orchestrator Kit for Agentic Reasoning - OrKa is a modular AI orchestration system that transforms Large Language Models (LLMs) into composable agents capable of reasoning, fact-checking, and constructing answers with transparent traceability.

Language: Python - Size: 48 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 23 - Forks: 3

katanemo/archgw

The smart edge and AI gateway for agents. Arch is a high-performance proxy server that handles the low-level work in building agents: like applying guardrails, routing prompts to the right agent, and unifying access to LLMs, etc. Natively designed to process prompts, it's framework-agnostic and helps you build agents faster.

Language: Rust - Size: 23.3 MB - Last synced at: about 23 hours ago - Pushed at: 1 day ago - Stars: 3,597 - Forks: 199

harleyszhang/llm_note

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

Language: Python - Size: 185 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 817 - Forks: 88

iPieter/llmq

A Scheduler for Batched LLM Inference

Language: Python - Size: 1.57 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 6 - Forks: 0

lennor-tan/openrouter-free-model

🌐 Explore and manage free models on OpenRouter effortlessly with our web app, featuring browsing, filtering, and multi-language support.

Language: TypeScript - Size: 2.03 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

felladrin/awesome-ai-web-search

List of software that allows searching the web with the assistance of AI: https://hf.co/spaces/felladrin/awesome-ai-web-search

Language: HTML - Size: 67.4 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1,025 - Forks: 76

AryanKarumuri/Gen-AI-Projects

This repository features a collection of generative AI applications designed to showcase the capabilities of AI across various domains.

Language: Jupyter Notebook - Size: 3.61 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 6 - Forks: 0

cactus-compute/cactus

Run AI locally on phones, wearables and AI-native hardware

Language: C++ - Size: 1.97 GB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 2,968 - Forks: 177

sophgo/LLM-TPU

Run generative AI models in sophgo BM1684X/BM1688

Language: C++ - Size: 299 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 238 - Forks: 40

superduper-io/superduper

Superduper: End-to-end framework for building custom AI applications and agents.

Language: Python - Size: 73.8 MB - Last synced at: 1 day ago - Pushed at: 5 days ago - Stars: 5,173 - Forks: 520

inboxpraveen/LLM-Minutes-of-Meeting

🎤📄 An innovative tool that transforms audio or video files into text transcripts and generates concise meeting minutes. Stay organized and efficient in your meetings, and get ready for Phase 2 where we'll be open for contributions to enable real-time meeting transcription! 🚀

Language: Python - Size: 7.14 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 148 - Forks: 14

kishoretvk/jsonAI

jsonAI, a open-source Python library designed to address the common challenge of forcing Large Language Models (LLMs) to produce reliable, structured data. The library ensures syntactically correct output by intelligently guiding the LLM to generate only content tokens, while programmatically handling the structural elements of the desired format.

Language: Python - Size: 1.1 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

mistralai/mistral-inference

Official inference library for Mistral models

Language: Jupyter Notebook - Size: 550 KB - Last synced at: 1 day ago - Pushed at: 6 months ago - Stars: 10,450 - Forks: 959

NVIDIA/GenerativeAIExamples

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

Language: Jupyter Notebook - Size: 95.4 MB - Last synced at: 1 day ago - Pushed at: 10 days ago - Stars: 3,386 - Forks: 827

gemma-facet/cloud-services

Open-source no-code platform-as-a-service to fine tune, evaluate, and ship customized Gemma VLM/SLM.

Language: Python - Size: 1.7 MB - Last synced at: about 23 hours ago - Pushed at: 1 day ago - Stars: 4 - Forks: 1

Mobile-Artificial-Intelligence/llama_sdk

lcpp is a dart implementation of llama.cpp used by the mobile artificial intelligence distribution (maid)

Language: C++ - Size: 1.78 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 101 - Forks: 23

vitalops/datatune

Perform transformations on your data with natural language using LLMs

Language: Python - Size: 1.87 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 101 - Forks: 8

dalisoft/awesome-hosting

List of awesome hosting sorted by minimal plan price

Size: 198 KB - Last synced at: about 23 hours ago - Pushed at: 7 days ago - Stars: 692 - Forks: 79

run-ai/genv

GPU environment and cluster management with LLM support

Language: Python - Size: 9.41 MB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 635 - Forks: 38

kserve/kserve

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

Language: Python - Size: 430 MB - Last synced at: 2 days ago - Pushed at: 7 days ago - Stars: 4,508 - Forks: 1,242

codelion/optillm

Optimizing inference proxy for LLMs

Language: Python - Size: 2.5 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 2,825 - Forks: 215

jd-opensource/xllm

A high-performance inference engine for LLMs, optimized for diverse AI accelerators.

Language: C++ - Size: 4 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 204 - Forks: 38

ray-project/ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Language: Python - Size: 561 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 38,771 - Forks: 6,759

EfficientMoE/MoE-Infinity

PyTorch library for cost-effective, fast and easy serving of MoE models.

Language: Python - Size: 529 KB - Last synced at: 1 day ago - Pushed at: 2 months ago - Stars: 230 - Forks: 17

Mote-Software/nanocoder

A beautiful local-first coding agent running in your terminal - built by the community for the community ⚒

Language: TypeScript - Size: 1.99 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 233 - Forks: 23

rishikksh20/qwen3-playground

Readable implementation of Qwen3 0.6B model

Language: Python - Size: 21.5 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

webgptorg/promptbook

It's time for a paradigm shift! The future of software is in plain English ✨

Language: TypeScript - Size: 245 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 126 - Forks: 14

vienneraphael/batchling

batchling is the universal Python GenAI Batch API client. Create, manage and run experiments on any batch-compatible provider.

Language: Python - Size: 1.52 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 4 - Forks: 0

tingaicompass/AI-Compass

“AI-Compass”将为社区指引在 AI 技术海洋中航行的方向，无论你是初学者还是进阶开发者，都能在这里找到通往 AI 各大方向的路径。旨在帮助开发者系统性地了解 AI 的核心概念、主流技术、前沿趋势，并通过实践掌握从理论到落地的全过程。

Size: 20.1 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 208 - Forks: 20

felladrin/MiniSearch

Minimalist web-searching platform with an AI assistant that runs directly from your browser. Uses WebLLM, Wllama and SearXNG. Demo: https://felladrin-minisearch.hf.space

Language: TypeScript - Size: 27.9 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 475 - Forks: 51

harleyszhang/lite_llama

A light llama-like llm inference framework based on the triton kernel.

Language: Python - Size: 39.4 MB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 150 - Forks: 20

neuron-core/deep-research-agent

Deep research agent built with Neuron PHP framewokrk

Language: PHP - Size: 270 KB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

neuron-core/travel-planner-agent

Travel planner agent built with Neuron PHP framework

Language: PHP - Size: 173 KB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

deepsai8/moe_llama

configurable moe-llama model training and inference built on pytorch

Language: Python - Size: 1.94 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

jax-ml/jax-llm-examples

Minimal yet performant LLM examples in pure JAX

Language: Python - Size: 2.56 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 151 - Forks: 20

Write-with-LAIKA/drama-engine

A Framework for Narrative Agents

Language: TypeScript - Size: 865 KB - Last synced at: 3 days ago - Pushed at: 12 months ago - Stars: 35 - Forks: 9

theankitdash/Personal-Chatbot-Deva-AI-Buddy-Companion

Built an AI Chatbot using Mistral-7B and Gradio with LangChain-ready architecture, supporting 1000+ contextual conversations, emotion-aware responses, and designed for long-term personalized user engagement.

Language: Python - Size: 147 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

FasterDecoding/Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Language: Jupyter Notebook - Size: 4.76 MB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 2,607 - Forks: 180

NotPunchnox/rkllama

Ollama alternative for Rockchip NPU: An efficient solution for running AI and Deep learning models on Rockchip devices with optimized NPU support ( rkllm )

Language: Python - Size: 12.5 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 273 - Forks: 39

qubasehq/LLMBuilder

LLMBuilder is a production-ready framework for training and fine-tuning Large Language Models (LLMs) — not a model itself. Designed for developers, researchers, and AI engineers, LLMBuilder provides a full pipeline to go from raw text data to deployable, optimized LLMs, all running locally on CPUs or GPUs.

Language: Python - Size: 18.7 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

cactus-compute/cactus-react

Cactus React Native package: Run AI locally in your React Native apps

Language: TypeScript - Size: 801 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

structuredllm/syncode

Efficient and general syntactical decoding for Large Language Models

Language: Python - Size: 55.4 MB - Last synced at: about 12 hours ago - Pushed at: 5 days ago - Stars: 288 - Forks: 31

AI-Hypercomputer/JetStream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

Language: Python - Size: 6.32 MB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 374 - Forks: 51

character-ai/prompt-poet

Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.

Language: Python - Size: 578 KB - Last synced at: 2 days ago - Pushed at: about 2 months ago - Stars: 1,104 - Forks: 93

rejunity/tiny-asic-1_58bit-matrix-mul

Tiny ASIC implementation for "The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits" matrix multiplication unit

Language: Verilog - Size: 9.15 MB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 159 - Forks: 11

Nikityyy/lille

A powerful 130-million-parameter model trained from scratch as part of a truly open-source stack, including a custom tokenizer, dataset, and optimizer.

Language: Python - Size: 405 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 25 - Forks: 0

beam-cloud/beta9

Secure, high-performance AI infrastructure in Python.

Language: Go - Size: 23 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,271 - Forks: 113

b4rtaz/distributed-llama

Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.

Language: C++ - Size: 3.33 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 2,360 - Forks: 164

julep-ai/steadytext

Deterministic text generation and embeddings with zero configuration

Language: PLpgSQL - Size: 9.75 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 16 - Forks: 0

sauravpanda/BrowserAI

Run local LLMs like llama, deepseek-distill, kokoro and more inside your browser

Language: TypeScript - Size: 293 MB - Last synced at: 5 days ago - Pushed at: 15 days ago - Stars: 1,207 - Forks: 106

bentoml/OpenLLM

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

Language: Python - Size: 41.1 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 11,738 - Forks: 763

CommanderLake/LMStud

Chat with GGUF LLMs using llama.cpp and a classic Windows Forms interface for minimal GUI bloat.

Language: C# - Size: 1.74 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 4 - Forks: 0

interestingLSY/swiftLLM

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Language: Python - Size: 234 KB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 245 - Forks: 28

SJTU-IPADS/PowerInfer

High-speed Large Language Model Serving for Local Deployment

Language: C++ - Size: 21.7 MB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 8,319 - Forks: 443

feifeibear/long-context-attention

USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference

Language: Python - Size: 4.61 MB - Last synced at: 3 days ago - Pushed at: about 2 months ago - Stars: 556 - Forks: 64

harleyszhang/harleyszhang.github.io Fork of tw93/tw93.github.io

🧗‍♂️ harleyszhang 的个人博客

Language: HTML - Size: 482 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 5 - Forks: 0

edge-inference/edgereasoning

Optimizing Reasoning LLM Deployment on Edge GPUs

Language: Python - Size: 365 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

predibase/lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Language: Python - Size: 6.62 MB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 3,401 - Forks: 262

AnswerDotAI/cold-compress

Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.

Language: Python - Size: 8.33 MB - Last synced at: 1 day ago - Pushed at: about 1 year ago - Stars: 144 - Forks: 14

LLM-inference-router/vllm-router

vLLM Router

Language: Python - Size: 45.9 KB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 41 - Forks: 2

bentoml/BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

Language: Python - Size: 98.3 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 8,028 - Forks: 871

little51/llm-dev

《大模型项目实战：多领域智能应用开发》配套资源

Language: JavaScript - Size: 2.39 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 168 - Forks: 32

bentoml/llm-inference-handbook

Everything you need to know about LLM inference

Language: TypeScript - Size: 10.7 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 223 - Forks: 21

arielfayol37/DeathNote

A multimodal-LLM-powered React-Native mobile app used to take notes.

Language: JavaScript - Size: 117 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

NPC-Worldwide/npc-studio

the IDE for research, built from the ground up with AI integrations

Language: JavaScript - Size: 12.1 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 75 - Forks: 6

taielab/awesome-hacking-lists

A curated collection of top-tier penetration testing tools and productivity utilities across multiple domains. Join us to explore, contribute, and enhance your hacking toolkit!

Size: 7.2 MB - Last synced at: 5 days ago - Pushed at: 16 days ago - Stars: 1,176 - Forks: 235

morpheuslord/HackBot

AI-powered cybersecurity chatbot designed to provide helpful and accurate answers to your cybersecurity-related queries and also do code analysis and scan analysis.

Language: Python - Size: 56.6 KB - Last synced at: 5 days ago - Pushed at: 10 months ago - Stars: 314 - Forks: 54

InternLM/lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Language: Python - Size: 9.01 MB - Last synced at: 6 days ago - Pushed at: 8 days ago - Stars: 6,942 - Forks: 599

andrew264/modelex

Doing devious stuff with AI

Language: Python - Size: 429 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2 - Forks: 0

alemoraru/exceed-project-validation

EXCEED Project Validation

Language: TypeScript - Size: 464 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

expectedparrot/edsl

Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.

Language: Python - Size: 127 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 268 - Forks: 25

Anish-CodeDev/Desktop_AI_Agent

AI-powered desktop assistant for secure file management, document creation, and workflow automation.

Language: Python - Size: 362 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

nomic-ai/gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

Language: C++ - Size: 42.6 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 76,581 - Forks: 8,261

flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

Language: Cuda - Size: 6.42 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 3,656 - Forks: 470

NikolasEnt/ollama-webui-intel

Ollama with intel (i)GPU acceleration in docker and benchmark

Language: Python - Size: 1.54 MB - Last synced at: 6 days ago - Pushed at: 9 days ago - Stars: 21 - Forks: 5

inspector-apm/neuron-ai

The PHP Agent Development Kit to build production-ready Agentic applications. Connect components (LLMs, vector DBs, memory) to agents that can interact with your data. With its modular architecture it's best suited for building RAG, question answering, or business process automations.

Language: PHP - Size: 17.5 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 965 - Forks: 102

bd4sur/Nano

电子鹦鹉 / Toy Language Model

Language: C - Size: 30.5 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 195 - Forks: 11

MCERQUA/LLM-Runner-Router

LLM-Runner-Router is not just another model loader - it's a full-stack agnostic neural orchestration system that adapts to ANY model format, ANY runtime environment, and ANY deployment scenario. Think of it as the Swiss Army knife of AI inference, but cooler and with more quantum entanglement.

Language: HTML - Size: 4.59 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 1

Related Keywords

llm-inference 1,018 llm 507 ai 165 llms 121 llm-training 97 python 89 llama 81 large-language-models 81 rag 75 chatbot 61 machine-learning 61 openai 54 generative-ai 53 llm-serving 53 llama2 52 llmops 52 llamacpp 49 inference 46 langchain 46 llama3 44 nlp 43 huggingface 43 ollama 43 artificial-intelligence 38 deep-learning 38 gpt 37 streamlit 35 pytorch 35 transformer 33 prompt-engineering 31 transformers 30 mistral 30 llm-framework 29 chatgpt 29 python3 29 genai 27 huggingface-transformers 27 qwen 26 fine-tuning 26 gemma 25 openai-api 25 retrieval-augmented-generation 24 gpu 24 vllm 24 cuda 21 quantization 21 langchain-python 21 docker 20 vector-database 20 open-source 19 llm-evaluation 19 agents 18 ai-agents 17 language-model 17 llama-cpp 17 natural-language-processing 16 llm-agent 16 large-language-model 16 deepseek 16 fastapi 16 agentic-ai 15 api 15 question-answering 15 agent 15 cpu-inference 14 gguf 14 inference-engine 14 cli 13 model-serving 13 gpt-4 13 gemini-api 13 mlops 13 benchmark 13 ggml 13 mistral-7b 13 flask 12 gemini 12 kubernetes 12 deepseek-r1 12 java 11 javascript 11 speculative-decoding 11 rust 11 llm-finetuning 11 embeddings 11 chat-application 10 lora 10 bindings 10 self-hosted 10 multimodal 10 ollama-api 10 anthropic 10 react 10 aws 10 ml 10 groq-api 9 security 9 text-generation 9 typescript 9 nextjs 9