GitHub topics: inference

Repositories

actualwitch/experiment

🔬 Experiment is an experiment is an experiment is an experiment is an experiment is an e̴x̷p̶e̶r̶i̶m̸e̸n̸t̴ ̷i̵s̴ ̷a̵n̷ è̷̜x̴̝͝p̵̨̐e̴̯̐r̴͔̍ì̸̻m̴̛͎e̵̥̔n̶̠̎t̷̠͝ ̶̼̳̕ǐ̷̞͍͂s̷͍̈́ ̶̫̀a̵̠͌n̵̲͊ ̶̣̼̆ḛ̸̀x̵̰͋p̵͉̺̎e̶̛͈̮ř̸̜̜̅ì̵̜̠͗ṃ̴̼͆ė̴̮n̶̪̈́t̸̢͖͋͂

Language: TypeScript - Size: 20.3 MB - Last synced at: about 3 hours ago - Pushed at: about 4 hours ago - Stars: 10 - Forks: 1

fastmachinelearning/qonnx

QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX

Language: Python - Size: 5.38 MB - Last synced at: about 3 hours ago - Pushed at: about 3 hours ago - Stars: 148 - Forks: 45

chama-45426/hub-api

AI模型接口汇总管理

Language: Go - Size: 31.3 KB - Last synced at: about 23 hours ago - Pushed at: about 23 hours ago - Stars: 0 - Forks: 0

NOOBKABHA/bulk-chain-shell

Shell client 📺 for shema-based reasoning 🧠 over your data via custom LLM provider 🌌

Language: Python - Size: 20.5 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 1

Thrasher-Intelligence/sigil

A local-first LLM development studio. Build, test, and customize inference workflows with your own models — no cloud, totally local.

Language: JavaScript - Size: 8.14 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 14 - Forks: 2

kibae/pg_onnx

pg_onnx: ONNX Runtime integrated with PostgreSQL. Perform ML inference with data in your database.

Language: C++ - Size: 108 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 49 - Forks: 2

hpcaitech/SwiftInfer

Efficient AI Inference & Serving

Language: Python - Size: 508 KB - Last synced at: about 20 hours ago - Pushed at: over 1 year ago - Stars: 469 - Forks: 28

mcreel/SimulatedNeuralMoments.jl

package for Bayesian and classical estimation and inference based on statistics that are filtered through a trained neural net

Language: Julia - Size: 9.7 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 22 - Forks: 1

alibaba/rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

Language: C++ - Size: 274 MB - Last synced at: about 20 hours ago - Pushed at: 4 months ago - Stars: 721 - Forks: 59

quic/ai-hub-apps

The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

Language: Java - Size: 27.9 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 188 - Forks: 43

open-edge-platform/model_api

Run Computer Vision AI models with simple C++/Python API and using OpenVINO Runtime

Language: Python - Size: 4.41 MB - Last synced at: 1 day ago - Pushed at: 17 days ago - Stars: 49 - Forks: 19

nbigaouette/onnxruntime-rs

Rust wrapper for Microsoft's ONNX Runtime (version 1.8)

Language: Rust - Size: 568 KB - Last synced at: 1 day ago - Pushed at: about 1 year ago - Stars: 292 - Forks: 99

kdkorthauer/dmrseq

R package for Inference of differentially methylated regions (DMRs) from bisulfite sequencing

Language: R - Size: 22.9 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 58 - Forks: 14

kserve/website

User documentation for KServe.

Language: HTML - Size: 118 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 106 - Forks: 135

Kotlin/Kotlin-AI-Examples

A collection of Kotlin-based examples featuring AI frameworks such as Spring AI, LangChain4j, and more — complete with Kotlin notebooks for hands-on learning.

Language: Jupyter Notebook - Size: 40.6 MB - Last synced at: 2 days ago - Pushed at: 27 days ago - Stars: 35 - Forks: 5

HyperMink/inferenceable

Scalable AI Inference Server for CPU and GPU with Node.js | Utilizes llama.cpp and parts of llamafile C/C++ core under the hood.

Language: JavaScript - Size: 4.86 MB - Last synced at: about 20 hours ago - Pushed at: 12 months ago - Stars: 14 - Forks: 0

nicolay-r/bulk-chain

A no-string API framework for deploying schema-based reasoning into third-party apps

Language: Python - Size: 224 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 20 - Forks: 2

vllm-project/vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

Language: Python - Size: 1.28 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 603 - Forks: 130

VectorInstitute/vector-inference

Efficient LLM inference on Slurm clusters using vLLM.

Language: Python - Size: 2.79 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 59 - Forks: 10

Tencent/TNN

TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its cross-platform capability, high performance, model compression and code pruning. Based on ncnn and Rapidnet, TNN further strengthens the support and performance optimization for mobile devices, and also draws on the advantages of good extensibility and high performance from existed open source efforts. TNN has been deployed in multiple Apps from Tencent, such as Mobile QQ, Weishi, Pitu, etc. Contributions are welcome to work in collaborative with us and make TNN a better framework.

Language: C++ - Size: 56 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 4,505 - Forks: 772

jomtek/LazenLang

An imperative, object-oriented, static and type-infered programming language.

Language: C# - Size: 3.47 MB - Last synced at: 2 days ago - Pushed at: about 2 years ago - Stars: 9 - Forks: 1

efeslab/Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Language: Cuda - Size: 32.4 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 805 - Forks: 35

AutoGPTQ/AutoGPTQ 📦

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language: Python - Size: 8.01 MB - Last synced at: 2 days ago - Pushed at: about 1 month ago - Stars: 4,837 - Forks: 513

KrasnitzLab/RAIDS

Accurate and robust inference of genetic ancestry from cancer-derived molecular data across genomic platforms

Language: R - Size: 9.44 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 5 - Forks: 4

nvidia-holoscan/holohub

Central repository for Holoscan Reference Applications

Language: C++ - Size: 268 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 133 - Forks: 89

pykeio/ort

Fast ML inference & training for ONNX models in Rust

Language: Rust - Size: 6.47 MB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 1,288 - Forks: 132

RayFernando1337/LLM-Calc

Instantly calculate the maximum size of quantized language models that can fit in your available RAM, helping you optimize your models for inference.

Language: TypeScript - Size: 190 KB - Last synced at: 3 days ago - Pushed at: 22 days ago - Stars: 224 - Forks: 12

mtrimolet/hroza

A C++ implementation of MarkovJunior based on StormKit

Language: C++ - Size: 363 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

google-ai-edge/mediapipe

Cross-platform, customizable ML solutions for live and streaming media.

Language: C++ - Size: 576 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 29,601 - Forks: 5,339

NexusGPU/tensor-fusion

Tensor Fusion is a state-of-the-art GPU virtualization and pooling solution designed to optimize GPU cluster utilization to its fullest potential.

Language: Go - Size: 795 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 31 - Forks: 8

szymonmaszke/torchlayers

Shape and dimension inference (Keras-like) for PyTorch layers and neural networks

Language: Python - Size: 3.19 MB - Last synced at: 2 days ago - Pushed at: almost 3 years ago - Stars: 570 - Forks: 44

bytedance/lightseq

LightSeq: A High Performance Library for Sequence Processing and Generation

Language: C++ - Size: 11.9 MB - Last synced at: about 20 hours ago - Pushed at: almost 2 years ago - Stars: 3,271 - Forks: 331

keith2018/TinyGPT

Tiny C++11 GPT-2 inference implementation from scratch

Language: C++ - Size: 648 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 58 - Forks: 11

sevagh/demucs.onnx

C++ ONNX/ORT inference for Demucs

Language: Python - Size: 159 KB - Last synced at: 1 day ago - Pushed at: 2 months ago - Stars: 12 - Forks: 3

BerkeleyLab/fiats

A deep learning library for use in high-performance computing applications in modern Fortran

Language: Fortran - Size: 66.4 MB - Last synced at: 1 day ago - Pushed at: 4 days ago - Stars: 42 - Forks: 11

quic/ai-hub-models

The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

Language: Python - Size: 255 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 682 - Forks: 107

MeowMeowSE3/language-detection-ai

Detect 18+ languages instantly using machine learning (BERT, LSTM, SVM) and NLP. Includes a Flask web app for real-time predictions, trained models, and detailed notebooks.

Language: Jupyter Notebook - Size: 1.76 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 1

roboflow/inference

Turn any computer or edge device into a command center for your computer vision projects.

Language: Python - Size: 124 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,658 - Forks: 176

AI-Hypercomputer/JetStream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

Language: Python - Size: 6.35 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 322 - Forks: 39

aws/sagemaker-inference-toolkit

Serve machine learning models within a 🐳 Docker container using 🧠 Amazon SageMaker.

Language: Python - Size: 667 KB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 405 - Forks: 82

pytorch/ao

PyTorch native quantization and sparsity for training and inference

Language: Python - Size: 30.5 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2,020 - Forks: 257

stas00/ml-engineering

Machine Learning Engineering Open Book

Language: Python - Size: 10.2 MB - Last synced at: 4 days ago - Pushed at: 10 days ago - Stars: 13,626 - Forks: 822

roryclear/clearcam

IP Camera with AI object detection. Currently for iOS only.

Language: Objective-C - Size: 10.9 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

triton-inference-server/onnxruntime_backend

The Triton backend for the ONNX Runtime.

Language: C++ - Size: 296 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 144 - Forks: 64

mpes-kit/fuller

Probabilistic machine learning for reconstruction and parametrization of electronic band sturcture from photoemission spectroscopy data

Language: Jupyter Notebook - Size: 25.9 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 13 - Forks: 2

gvergnaud/ts-pattern

🎨 The exhaustive Pattern Matching library for TypeScript, with smart type inference.

Language: TypeScript - Size: 2.7 MB - Last synced at: 4 days ago - Pushed at: 10 days ago - Stars: 13,487 - Forks: 147

superduper-io/superduper

Superduper: End-to-end framework for building custom AI applications and agents.

Language: Python - Size: 73.6 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 5,053 - Forks: 493

jy-yuan/KIVI

[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Language: Python - Size: 16.7 MB - Last synced at: 4 days ago - Pushed at: 4 months ago - Stars: 295 - Forks: 30

ronniross/symbiotic-core-library

Toolkits, instructions, prompts, bibliographies, and research support designed to enhance/test LLM metacognitive/contextual awareness, address deficiencies, and unlock emergent properties/human-AI symbiosis.

Size: 8 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 6 - Forks: 0

rebellions-sw/optimum-rbln

⚡ A seamless integration of HuggingFace Transformers & Diffusers with RBLN SDK for efficient inference on RBLN NPUs.

Language: Python - Size: 1.12 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 9 - Forks: 1

interestingLSY/swiftLLM

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Language: Python - Size: 224 KB - Last synced at: about 19 hours ago - Pushed at: 12 days ago - Stars: 171 - Forks: 17

huggingface/optimum

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

Language: Python - Size: 5.64 MB - Last synced at: 4 days ago - Pushed at: 11 days ago - Stars: 2,878 - Forks: 532

triton-inference-server/server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Language: Python - Size: 35.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 9,170 - Forks: 1,569

gcanti/io-ts

Runtime type system for IO decoding/encoding

Language: TypeScript - Size: 2.9 MB - Last synced at: 4 days ago - Pushed at: 5 months ago - Stars: 6,773 - Forks: 328

Torsion-Audio/nn-inference-template

Neural network inference template for real-time cricital audio environments - presented at ADC23

Language: C++ - Size: 18.6 MB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 109 - Forks: 5

intel/xFasterTransformer

Language: C++ - Size: 52.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 422 - Forks: 67

bigai-nlco/TokenSwift

From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation

Language: Python - Size: 61.6 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 89 - Forks: 8

OpenNMT/CTranslate2

Fast inference engine for Transformer models

Language: C++ - Size: 14.5 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 3,785 - Forks: 354

google/XNNPACK

High-efficiency floating-point neural network inference operators for mobile, server, and Web

Language: C - Size: 166 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 2,008 - Forks: 411

lofcz/LlmTornado

The .NET library to consume 100+ APIs: OpenAI, Anthropic, Google, DeepSeek, Cohere, Mistral, Azure, xAI, Perplexity, Groq, Ollama, vLLM, and many more!

Language: C# - Size: 15.6 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 150 - Forks: 22

Trusted-AI/adversarial-robustness-toolbox

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

Language: Python - Size: 610 MB - Last synced at: 5 days ago - Pushed at: 7 days ago - Stars: 5,234 - Forks: 1,210

llmariner/llmariner

Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs.

Language: Go - Size: 7.87 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 67 - Forks: 5

NVIDIA/kvpress

LLM KV cache compression made easy

Language: Python - Size: 5.55 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 472 - Forks: 36

DanielHermosilla/ecological-inference-elections Fork of pabloubilla/ecological-inference-elections

R library for the work by Thraves, C. and Ubilla, P.: 'Fast Ecological Inference Algorithm for the R×C Case'

Language: HTML - Size: 442 MB - Last synced at: about 22 hours ago - Pushed at: about 22 hours ago - Stars: 0 - Forks: 0

pipeless-ai/pipeless

An open-source computer vision framework to build and deploy apps in minutes

Language: Rust - Size: 142 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 752 - Forks: 38

decs/typeschema

🛵 Universal adapter for TypeScript schema validation.

Language: TypeScript - Size: 1.67 MB - Last synced at: 5 days ago - Pushed at: 7 months ago - Stars: 443 - Forks: 14

tairov/llama2.py Fork of karpathy/llama2.c

Inference Llama 2 in one file of pure Python

Language: Python - Size: 6.29 MB - Last synced at: about 20 hours ago - Pushed at: 7 months ago - Stars: 415 - Forks: 28

aws/studio-lab-examples

Example notebooks for working with SageMaker Studio Lab. Sign up for an account at the link below!

Language: Jupyter Notebook - Size: 33.9 MB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 713 - Forks: 207

inferx-net/inferx

InferX is a Inference Function as a Service Platform

Language: Rust - Size: 1.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 61 - Forks: 3

NextGenContributions/django2pydantic

Django2pydantic is the most complete library for converting Django ORM models to Pydantic models

Language: Python - Size: 540 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 3 - Forks: 1

ostis-apps/ostis-discrete-math

Intelligent help system for Discrete Math

Language: Shell - Size: 3.43 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 6 - Forks: 7

OneInterface/realtime-bakllava

llama.cpp with BakLLaVA model describes what does it see

Language: Python - Size: 2.84 MB - Last synced at: about 19 hours ago - Pushed at: over 1 year ago - Stars: 383 - Forks: 42

vectorch-ai/ScaleLLM

A high-performance inference system for large language models, designed for production environments.

Language: C++ - Size: 19 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 436 - Forks: 35

aws/amazon-sagemaker-examples

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.

Language: Jupyter Notebook - Size: 634 MB - Last synced at: 5 days ago - Pushed at: about 2 months ago - Stars: 10,483 - Forks: 6,884

itlab-vision/dl-benchmark

Deep Learning Inference benchmark. Supports OpenVINO™ toolkit, TensorFlow, TensorFlow Lite, ONNX Runtime, OpenCV DNN, MXNet, PyTorch, Apache TVM, ncnn, PaddlePaddle, etc.

Language: HTML - Size: 141 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 32 - Forks: 38

typedb/typeql

TypeQL: the power of programming, in your database

Language: Java - Size: 6.27 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 227 - Forks: 46

BasLinders/happyhorizon_statstoolkit

An ongoing project for an online toolkit to analyze online controlled experiments. Its mission: To make inferential statistics accessible for everyone.

Language: Python - Size: 382 KB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

open-edge-platform/geti

Build computer vision models in a fraction of the time and with less data.

Language: TypeScript - Size: 65.7 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 160 - Forks: 12

Twixie5/OpenVINO_Asynchronous_API_Performance_Demo

This project demonstrates the high performance of OpenVINO asynchronous inference API

Language: Python - Size: 28.5 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

amiot99/gpu-inference-microservice

FastAPI-based microservice for GPU-accelerated image classification using ResNet18 and Docker

Language: Python - Size: 9.77 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

OpenCSGs/csghub

CSGHub is a brand-new open-source platform for managing LLMs, developed by the OpenCSG team. It offers both open-source and on-premise/SaaS solutions, with features comparable to Hugging Face. Gain full control over the lifecycle of LLMs, datasets, and agents, with Python SDK compatibility with Hugging Face. Join us! ⭐️

Language: Vue - Size: 49.9 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2,710 - Forks: 388

typedb/typedb

TypeDB: the power of programming, in your database

Language: Rust - Size: 104 MB - Last synced at: 5 days ago - Pushed at: 9 days ago - Stars: 3,973 - Forks: 344

rupeshtr78/rag-agent-rust

CLI-based LanceDB vector embedding with LLM integration for RAG workflows

Language: Rust - Size: 1.17 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

openvinotoolkit/openvino

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

Language: C++ - Size: 845 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 8,241 - Forks: 2,590

intel/ai-reference-models

Intel® AI Reference Models: contains Intel optimizations for running deep learning workloads on Intel® Xeon® Scalable processors and Intel® Data Center GPUs

Language: Python - Size: 621 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 711 - Forks: 225

xorbitsai/inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

Language: Python - Size: 44.9 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 7,722 - Forks: 657

Related Keywords

inference 1,370 machine-learning 267 deep-learning 261 python 170 llm 150 ai 124 pytorch 122 computer-vision 117 openvino 101 object-detection 97 tensorflow 84 intel 73 onnx 72 statistics 61 inference-engine 57 tensorrt 52 edge-computing 51 gpu 48 llama 45 edge 44 real-time 44 edge-ai 43 pretrained-models 43 transformers 43 artificial-intelligence 43 image-recognition 41 video 41 r 41 cuda 40 docker 39 transformer 39 training 38 huggingface 37 neural-network 37 reference-implementation 37 live-demo 36 llm-inference 33 nlp 33 onnxruntime 33 llms 33 bayesian-inference 32 api 32 python3 31 nvidia 31 quantization 30 cpp 30 mlops 30 classification 28 ml 27 bayesian 26 yolo 26 gpt 26 model-serving 25 deep-neural-networks 24 data-science 23 rust 22 neural-networks 22 deployment 22 opencv 21 prediction 21 llama2 21 openai 20 fine-tuning 20 deeplearning 20 large-language-models 20 typescript 19 mcmc 19 chatbot 19 mistral 19 java 18 model 18 inference-server 18 openvino-toolkit 17 keras 17 serving 17 kubernetes 17 network 16 caffe 16 logic 16 cnn 16 aws 15 regression 15 probability 15 natural-language-processing 15 demo 15 reasoning 15 generative-ai 15 llm-serving 15 pipeline 14 fastapi 14 serverless 14 graph 13 speech-recognition 13 llmops 13 chatgpt 13 huggingface-transformers 13 sampling 13 rest-api 13 detection 13 type 13