GitHub topics: inference
actualwitch/experiment
🔬 Experiment is an experiment is an experiment is an experiment is an experiment is an e̴x̷p̶e̶r̶i̶m̸e̸n̸t̴ ̷i̵s̴ ̷a̵n̷ è̷̜x̴̝͝p̵̨̐e̴̯̐r̴͔̍ì̸̻m̴̛͎e̵̥̔n̶̠̎t̷̠͝ ̶̼̳̕ǐ̷̞͍͂s̷͍̈́ ̶̫̀a̵̠͌n̵̲͊ ̶̣̼̆ḛ̸̀x̵̰͋p̵͉̺̎e̶̛͈̮ř̸̜̜̅ì̵̜̠͗ṃ̴̼͆ė̴̮n̶̪̈́t̸̢͖͋͂
Language: TypeScript - Size: 20.3 MB - Last synced at: about 3 hours ago - Pushed at: about 4 hours ago - Stars: 10 - Forks: 1

fastmachinelearning/qonnx
QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX
Language: Python - Size: 5.38 MB - Last synced at: about 3 hours ago - Pushed at: about 3 hours ago - Stars: 148 - Forks: 45

chama-45426/hub-api
AI模型接口汇总管理
Language: Go - Size: 31.3 KB - Last synced at: about 23 hours ago - Pushed at: about 23 hours ago - Stars: 0 - Forks: 0

NOOBKABHA/bulk-chain-shell
Shell client 📺 for shema-based reasoning 🧠 over your data via custom LLM provider 🌌
Language: Python - Size: 20.5 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 1

Thrasher-Intelligence/sigil
A local-first LLM development studio. Build, test, and customize inference workflows with your own models — no cloud, totally local.
Language: JavaScript - Size: 8.14 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 14 - Forks: 2

kibae/pg_onnx
pg_onnx: ONNX Runtime integrated with PostgreSQL. Perform ML inference with data in your database.
Language: C++ - Size: 108 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 49 - Forks: 2

hpcaitech/SwiftInfer
Efficient AI Inference & Serving
Language: Python - Size: 508 KB - Last synced at: about 20 hours ago - Pushed at: over 1 year ago - Stars: 469 - Forks: 28

mcreel/SimulatedNeuralMoments.jl
package for Bayesian and classical estimation and inference based on statistics that are filtered through a trained neural net
Language: Julia - Size: 9.7 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 22 - Forks: 1

alibaba/rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
Language: C++ - Size: 274 MB - Last synced at: about 20 hours ago - Pushed at: 4 months ago - Stars: 721 - Forks: 59

quic/ai-hub-apps
The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
Language: Java - Size: 27.9 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 188 - Forks: 43

open-edge-platform/model_api
Run Computer Vision AI models with simple C++/Python API and using OpenVINO Runtime
Language: Python - Size: 4.41 MB - Last synced at: 1 day ago - Pushed at: 17 days ago - Stars: 49 - Forks: 19

nbigaouette/onnxruntime-rs
Rust wrapper for Microsoft's ONNX Runtime (version 1.8)
Language: Rust - Size: 568 KB - Last synced at: 1 day ago - Pushed at: about 1 year ago - Stars: 292 - Forks: 99

kdkorthauer/dmrseq
R package for Inference of differentially methylated regions (DMRs) from bisulfite sequencing
Language: R - Size: 22.9 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 58 - Forks: 14

kserve/website
User documentation for KServe.
Language: HTML - Size: 118 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 106 - Forks: 135

Kotlin/Kotlin-AI-Examples
A collection of Kotlin-based examples featuring AI frameworks such as Spring AI, LangChain4j, and more — complete with Kotlin notebooks for hands-on learning.
Language: Jupyter Notebook - Size: 40.6 MB - Last synced at: 2 days ago - Pushed at: 27 days ago - Stars: 35 - Forks: 5

HyperMink/inferenceable
Scalable AI Inference Server for CPU and GPU with Node.js | Utilizes llama.cpp and parts of llamafile C/C++ core under the hood.
Language: JavaScript - Size: 4.86 MB - Last synced at: about 20 hours ago - Pushed at: 12 months ago - Stars: 14 - Forks: 0

nicolay-r/bulk-chain
A no-string API framework for deploying schema-based reasoning into third-party apps
Language: Python - Size: 224 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 20 - Forks: 2

vllm-project/vllm-ascend
Community maintained hardware plugin for vLLM on Ascend
Language: Python - Size: 1.28 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 603 - Forks: 130

VectorInstitute/vector-inference
Efficient LLM inference on Slurm clusters using vLLM.
Language: Python - Size: 2.79 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 59 - Forks: 10

Tencent/TNN
TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its cross-platform capability, high performance, model compression and code pruning. Based on ncnn and Rapidnet, TNN further strengthens the support and performance optimization for mobile devices, and also draws on the advantages of good extensibility and high performance from existed open source efforts. TNN has been deployed in multiple Apps from Tencent, such as Mobile QQ, Weishi, Pitu, etc. Contributions are welcome to work in collaborative with us and make TNN a better framework.
Language: C++ - Size: 56 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 4,505 - Forks: 772

jomtek/LazenLang
An imperative, object-oriented, static and type-infered programming language.
Language: C# - Size: 3.47 MB - Last synced at: 2 days ago - Pushed at: about 2 years ago - Stars: 9 - Forks: 1

efeslab/Nanoflow
A throughput-oriented high-performance serving framework for LLMs
Language: Cuda - Size: 32.4 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 805 - Forks: 35

AutoGPTQ/AutoGPTQ 📦
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Language: Python - Size: 8.01 MB - Last synced at: 2 days ago - Pushed at: about 1 month ago - Stars: 4,837 - Forks: 513

KrasnitzLab/RAIDS
Accurate and robust inference of genetic ancestry from cancer-derived molecular data across genomic platforms
Language: R - Size: 9.44 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 5 - Forks: 4

nvidia-holoscan/holohub
Central repository for Holoscan Reference Applications
Language: C++ - Size: 268 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 133 - Forks: 89

pykeio/ort
Fast ML inference & training for ONNX models in Rust
Language: Rust - Size: 6.47 MB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 1,288 - Forks: 132

RayFernando1337/LLM-Calc
Instantly calculate the maximum size of quantized language models that can fit in your available RAM, helping you optimize your models for inference.
Language: TypeScript - Size: 190 KB - Last synced at: 3 days ago - Pushed at: 22 days ago - Stars: 224 - Forks: 12

mtrimolet/hroza
A C++ implementation of MarkovJunior based on StormKit
Language: C++ - Size: 363 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

google-ai-edge/mediapipe
Cross-platform, customizable ML solutions for live and streaming media.
Language: C++ - Size: 576 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 29,601 - Forks: 5,339

NexusGPU/tensor-fusion
Tensor Fusion is a state-of-the-art GPU virtualization and pooling solution designed to optimize GPU cluster utilization to its fullest potential.
Language: Go - Size: 795 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 31 - Forks: 8

szymonmaszke/torchlayers
Shape and dimension inference (Keras-like) for PyTorch layers and neural networks
Language: Python - Size: 3.19 MB - Last synced at: 2 days ago - Pushed at: almost 3 years ago - Stars: 570 - Forks: 44

bytedance/lightseq
LightSeq: A High Performance Library for Sequence Processing and Generation
Language: C++ - Size: 11.9 MB - Last synced at: about 20 hours ago - Pushed at: almost 2 years ago - Stars: 3,271 - Forks: 331

keith2018/TinyGPT
Tiny C++11 GPT-2 inference implementation from scratch
Language: C++ - Size: 648 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 58 - Forks: 11

sevagh/demucs.onnx
C++ ONNX/ORT inference for Demucs
Language: Python - Size: 159 KB - Last synced at: 1 day ago - Pushed at: 2 months ago - Stars: 12 - Forks: 3

BerkeleyLab/fiats
A deep learning library for use in high-performance computing applications in modern Fortran
Language: Fortran - Size: 66.4 MB - Last synced at: 1 day ago - Pushed at: 4 days ago - Stars: 42 - Forks: 11

quic/ai-hub-models
The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
Language: Python - Size: 255 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 682 - Forks: 107

MeowMeowSE3/language-detection-ai
Detect 18+ languages instantly using machine learning (BERT, LSTM, SVM) and NLP. Includes a Flask web app for real-time predictions, trained models, and detailed notebooks.
Language: Jupyter Notebook - Size: 1.76 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 1

roboflow/inference
Turn any computer or edge device into a command center for your computer vision projects.
Language: Python - Size: 124 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,658 - Forks: 176

AI-Hypercomputer/JetStream
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
Language: Python - Size: 6.35 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 322 - Forks: 39

aws/sagemaker-inference-toolkit
Serve machine learning models within a 🐳 Docker container using 🧠 Amazon SageMaker.
Language: Python - Size: 667 KB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 405 - Forks: 82

pytorch/ao
PyTorch native quantization and sparsity for training and inference
Language: Python - Size: 30.5 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2,020 - Forks: 257

stas00/ml-engineering
Machine Learning Engineering Open Book
Language: Python - Size: 10.2 MB - Last synced at: 4 days ago - Pushed at: 10 days ago - Stars: 13,626 - Forks: 822

roryclear/clearcam
IP Camera with AI object detection. Currently for iOS only.
Language: Objective-C - Size: 10.9 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

triton-inference-server/onnxruntime_backend
The Triton backend for the ONNX Runtime.
Language: C++ - Size: 296 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 144 - Forks: 64

mpes-kit/fuller
Probabilistic machine learning for reconstruction and parametrization of electronic band sturcture from photoemission spectroscopy data
Language: Jupyter Notebook - Size: 25.9 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 13 - Forks: 2

gvergnaud/ts-pattern
🎨 The exhaustive Pattern Matching library for TypeScript, with smart type inference.
Language: TypeScript - Size: 2.7 MB - Last synced at: 4 days ago - Pushed at: 10 days ago - Stars: 13,487 - Forks: 147

superduper-io/superduper
Superduper: End-to-end framework for building custom AI applications and agents.
Language: Python - Size: 73.6 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 5,053 - Forks: 493

jy-yuan/KIVI
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
Language: Python - Size: 16.7 MB - Last synced at: 4 days ago - Pushed at: 4 months ago - Stars: 295 - Forks: 30

ronniross/symbiotic-core-library
Toolkits, instructions, prompts, bibliographies, and research support designed to enhance/test LLM metacognitive/contextual awareness, address deficiencies, and unlock emergent properties/human-AI symbiosis.
Size: 8 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 6 - Forks: 0

rebellions-sw/optimum-rbln
⚡ A seamless integration of HuggingFace Transformers & Diffusers with RBLN SDK for efficient inference on RBLN NPUs.
Language: Python - Size: 1.12 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 9 - Forks: 1

interestingLSY/swiftLLM
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).
Language: Python - Size: 224 KB - Last synced at: about 19 hours ago - Pushed at: 12 days ago - Stars: 171 - Forks: 17

huggingface/optimum
🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
Language: Python - Size: 5.64 MB - Last synced at: 4 days ago - Pushed at: 11 days ago - Stars: 2,878 - Forks: 532

triton-inference-server/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Language: Python - Size: 35.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 9,170 - Forks: 1,569

gcanti/io-ts
Runtime type system for IO decoding/encoding
Language: TypeScript - Size: 2.9 MB - Last synced at: 4 days ago - Pushed at: 5 months ago - Stars: 6,773 - Forks: 328

Torsion-Audio/nn-inference-template
Neural network inference template for real-time cricital audio environments - presented at ADC23
Language: C++ - Size: 18.6 MB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 109 - Forks: 5

intel/xFasterTransformer
Language: C++ - Size: 52.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 422 - Forks: 67

bigai-nlco/TokenSwift
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation
Language: Python - Size: 61.6 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 89 - Forks: 8

OpenNMT/CTranslate2
Fast inference engine for Transformer models
Language: C++ - Size: 14.5 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 3,785 - Forks: 354

google/XNNPACK
High-efficiency floating-point neural network inference operators for mobile, server, and Web
Language: C - Size: 166 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 2,008 - Forks: 411

lofcz/LlmTornado
The .NET library to consume 100+ APIs: OpenAI, Anthropic, Google, DeepSeek, Cohere, Mistral, Azure, xAI, Perplexity, Groq, Ollama, vLLM, and many more!
Language: C# - Size: 15.6 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 150 - Forks: 22

Trusted-AI/adversarial-robustness-toolbox
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
Language: Python - Size: 610 MB - Last synced at: 5 days ago - Pushed at: 7 days ago - Stars: 5,234 - Forks: 1,210

llmariner/llmariner
Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs.
Language: Go - Size: 7.87 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 67 - Forks: 5

NVIDIA/kvpress
LLM KV cache compression made easy
Language: Python - Size: 5.55 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 472 - Forks: 36

DanielHermosilla/ecological-inference-elections Fork of pabloubilla/ecological-inference-elections
R library for the work by Thraves, C. and Ubilla, P.: 'Fast Ecological Inference Algorithm for the R×C Case'
Language: HTML - Size: 442 MB - Last synced at: about 22 hours ago - Pushed at: about 22 hours ago - Stars: 0 - Forks: 0

pipeless-ai/pipeless
An open-source computer vision framework to build and deploy apps in minutes
Language: Rust - Size: 142 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 752 - Forks: 38

decs/typeschema
🛵 Universal adapter for TypeScript schema validation.
Language: TypeScript - Size: 1.67 MB - Last synced at: 5 days ago - Pushed at: 7 months ago - Stars: 443 - Forks: 14

tairov/llama2.py Fork of karpathy/llama2.c
Inference Llama 2 in one file of pure Python
Language: Python - Size: 6.29 MB - Last synced at: about 20 hours ago - Pushed at: 7 months ago - Stars: 415 - Forks: 28

aws/studio-lab-examples
Example notebooks for working with SageMaker Studio Lab. Sign up for an account at the link below!
Language: Jupyter Notebook - Size: 33.9 MB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 713 - Forks: 207

inferx-net/inferx
InferX is a Inference Function as a Service Platform
Language: Rust - Size: 1.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 61 - Forks: 3

NextGenContributions/django2pydantic
Django2pydantic is the most complete library for converting Django ORM models to Pydantic models
Language: Python - Size: 540 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 3 - Forks: 1

ostis-apps/ostis-discrete-math
Intelligent help system for Discrete Math
Language: Shell - Size: 3.43 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 6 - Forks: 7

OneInterface/realtime-bakllava
llama.cpp with BakLLaVA model describes what does it see
Language: Python - Size: 2.84 MB - Last synced at: about 19 hours ago - Pushed at: over 1 year ago - Stars: 383 - Forks: 42

vectorch-ai/ScaleLLM
A high-performance inference system for large language models, designed for production environments.
Language: C++ - Size: 19 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 436 - Forks: 35

aws/amazon-sagemaker-examples
Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
Language: Jupyter Notebook - Size: 634 MB - Last synced at: 5 days ago - Pushed at: about 2 months ago - Stars: 10,483 - Forks: 6,884

itlab-vision/dl-benchmark
Deep Learning Inference benchmark. Supports OpenVINO™ toolkit, TensorFlow, TensorFlow Lite, ONNX Runtime, OpenCV DNN, MXNet, PyTorch, Apache TVM, ncnn, PaddlePaddle, etc.
Language: HTML - Size: 141 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 32 - Forks: 38

typedb/typeql
TypeQL: the power of programming, in your database
Language: Java - Size: 6.27 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 227 - Forks: 46

BasLinders/happyhorizon_statstoolkit
An ongoing project for an online toolkit to analyze online controlled experiments. Its mission: To make inferential statistics accessible for everyone.
Language: Python - Size: 382 KB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

open-edge-platform/geti
Build computer vision models in a fraction of the time and with less data.
Language: TypeScript - Size: 65.7 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 160 - Forks: 12

Twixie5/OpenVINO_Asynchronous_API_Performance_Demo
This project demonstrates the high performance of OpenVINO asynchronous inference API
Language: Python - Size: 28.5 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

amiot99/gpu-inference-microservice
FastAPI-based microservice for GPU-accelerated image classification using ResNet18 and Docker
Language: Python - Size: 9.77 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

OpenCSGs/csghub
CSGHub is a brand-new open-source platform for managing LLMs, developed by the OpenCSG team. It offers both open-source and on-premise/SaaS solutions, with features comparable to Hugging Face. Gain full control over the lifecycle of LLMs, datasets, and agents, with Python SDK compatibility with Hugging Face. Join us! ⭐️
Language: Vue - Size: 49.9 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2,710 - Forks: 388

typedb/typedb
TypeDB: the power of programming, in your database
Language: Rust - Size: 104 MB - Last synced at: 5 days ago - Pushed at: 9 days ago - Stars: 3,973 - Forks: 344

rupeshtr78/rag-agent-rust
CLI-based LanceDB vector embedding with LLM integration for RAG workflows
Language: Rust - Size: 1.17 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

openvinotoolkit/openvino
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
Language: C++ - Size: 845 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 8,241 - Forks: 2,590

intel/ai-reference-models
Intel® AI Reference Models: contains Intel optimizations for running deep learning workloads on Intel® Xeon® Scalable processors and Intel® Data Center GPUs
Language: Python - Size: 621 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 711 - Forks: 225

xorbitsai/inference
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
Language: Python - Size: 44.9 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 7,722 - Forks: 657

sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
Language: Python - Size: 17.9 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 13,942 - Forks: 1,659

SYSTRAN/faster-whisper
Faster Whisper transcription with CTranslate2
Language: Python - Size: 36.6 MB - Last synced at: 6 days ago - Pushed at: 12 days ago - Stars: 15,802 - Forks: 1,316

zml/zml
Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild
Language: Zig - Size: 2.14 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 2,243 - Forks: 80

hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
Language: Python - Size: 62.8 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 40,845 - Forks: 4,499

ggml-org/whisper.cpp
Port of OpenAI's Whisper model in C/C++
Language: C++ - Size: 20.9 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 39,736 - Forks: 4,184

vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language: Python - Size: 47.6 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 46,555 - Forks: 7,236

ntua-unit-of-control-and-informatics/jaqpot-frontend
The Jaqpot project's frontend app serves as the interactive gateway for users to engage with our predictive modeling platform.
Language: TypeScript - Size: 5.17 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

NVIDIA/TensorRT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Language: C++ - Size: 130 MB - Last synced at: 6 days ago - Pushed at: 9 days ago - Stars: 11,545 - Forks: 2,185

ntua-unit-of-control-and-informatics/jaqpotpy
Open-source Python client for deploying machine learning models and obtaining predictions via the Jaqpot API.
Language: Python - Size: 362 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 17 - Forks: 1

openvinotoolkit/openvino_notebooks
📚 Jupyter notebook tutorials for OpenVINO™
Language: Jupyter Notebook - Size: 2.49 GB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2,768 - Forks: 884

huggingface/text-generation-inference
Large Language Model Text Generation Inference
Language: Python - Size: 13.1 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 10,081 - Forks: 1,189

Tencent/ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Language: C++ - Size: 26.4 MB - Last synced at: 6 days ago - Pushed at: 8 days ago - Stars: 21,405 - Forks: 4,246

dlstreamer/dlstreamer
This repository is a home to Intel® Deep Learning Streamer (Intel® DL Streamer) Pipeline Framework. Pipeline Framework is a streaming media analytics framework, based on GStreamer* multimedia framework, for creating complex media analytics pipelines.
Language: C++ - Size: 9.39 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 550 - Forks: 173

FocoosAI/focoos
Focoos SDK
Language: Python - Size: 11 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 89 - Forks: 0
