An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: inference

actualwitch/experiment

🔬 Experiment is an experiment is an experiment is an experiment is an experiment is an e̴x̷p̶e̶r̶i̶m̸e̸n̸t̴ ̷i̵s̴ ̷a̵n̷ è̷̜x̴̝͝p̵̨̐e̴̯̐r̴͔̍ì̸̻m̴̛͎e̵̥̔n̶̠̎t̷̠͝ ̶̼̳̕ǐ̷̞͍͂s̷͍̈́ ̶̫̀a̵̠͌n̵̲͊ ̶̣̼̆ḛ̸̀x̵̰͋p̵͉̺̎e̶̛͈̮ř̸̜̜̅ì̵̜̠͗ṃ̴̼͆ė̴̮n̶̪̈́t̸̢͖͋͂

Language: TypeScript - Size: 20.3 MB - Last synced at: about 3 hours ago - Pushed at: about 4 hours ago - Stars: 10 - Forks: 1

fastmachinelearning/qonnx

QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX

Language: Python - Size: 5.38 MB - Last synced at: about 3 hours ago - Pushed at: about 3 hours ago - Stars: 148 - Forks: 45

chama-45426/hub-api

AI模型接口汇总管理

Language: Go - Size: 31.3 KB - Last synced at: about 23 hours ago - Pushed at: about 23 hours ago - Stars: 0 - Forks: 0

NOOBKABHA/bulk-chain-shell

Shell client 📺 for shema-based reasoning 🧠 over your data via custom LLM provider 🌌

Language: Python - Size: 20.5 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 1

Thrasher-Intelligence/sigil

A local-first LLM development studio. Build, test, and customize inference workflows with your own models — no cloud, totally local.

Language: JavaScript - Size: 8.14 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 14 - Forks: 2

kibae/pg_onnx

pg_onnx: ONNX Runtime integrated with PostgreSQL. Perform ML inference with data in your database.

Language: C++ - Size: 108 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 49 - Forks: 2

hpcaitech/SwiftInfer

Efficient AI Inference & Serving

Language: Python - Size: 508 KB - Last synced at: about 20 hours ago - Pushed at: over 1 year ago - Stars: 469 - Forks: 28

mcreel/SimulatedNeuralMoments.jl

package for Bayesian and classical estimation and inference based on statistics that are filtered through a trained neural net

Language: Julia - Size: 9.7 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 22 - Forks: 1

alibaba/rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

Language: C++ - Size: 274 MB - Last synced at: about 20 hours ago - Pushed at: 4 months ago - Stars: 721 - Forks: 59

quic/ai-hub-apps

The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

Language: Java - Size: 27.9 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 188 - Forks: 43

open-edge-platform/model_api

Run Computer Vision AI models with simple C++/Python API and using OpenVINO Runtime

Language: Python - Size: 4.41 MB - Last synced at: 1 day ago - Pushed at: 17 days ago - Stars: 49 - Forks: 19

nbigaouette/onnxruntime-rs

Rust wrapper for Microsoft's ONNX Runtime (version 1.8)

Language: Rust - Size: 568 KB - Last synced at: 1 day ago - Pushed at: about 1 year ago - Stars: 292 - Forks: 99

kdkorthauer/dmrseq

R package for Inference of differentially methylated regions (DMRs) from bisulfite sequencing

Language: R - Size: 22.9 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 58 - Forks: 14

kserve/website

User documentation for KServe.

Language: HTML - Size: 118 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 106 - Forks: 135

Kotlin/Kotlin-AI-Examples

A collection of Kotlin-based examples featuring AI frameworks such as Spring AI, LangChain4j, and more — complete with Kotlin notebooks for hands-on learning.

Language: Jupyter Notebook - Size: 40.6 MB - Last synced at: 2 days ago - Pushed at: 27 days ago - Stars: 35 - Forks: 5

HyperMink/inferenceable

Scalable AI Inference Server for CPU and GPU with Node.js | Utilizes llama.cpp and parts of llamafile C/C++ core under the hood.

Language: JavaScript - Size: 4.86 MB - Last synced at: about 20 hours ago - Pushed at: 12 months ago - Stars: 14 - Forks: 0

nicolay-r/bulk-chain

A no-string API framework for deploying schema-based reasoning into third-party apps

Language: Python - Size: 224 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 20 - Forks: 2

vllm-project/vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

Language: Python - Size: 1.28 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 603 - Forks: 130

VectorInstitute/vector-inference

Efficient LLM inference on Slurm clusters using vLLM.

Language: Python - Size: 2.79 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 59 - Forks: 10

Tencent/TNN

TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its cross-platform capability, high performance, model compression and code pruning. Based on ncnn and Rapidnet, TNN further strengthens the support and performance optimization for mobile devices, and also draws on the advantages of good extensibility and high performance from existed open source efforts. TNN has been deployed in multiple Apps from Tencent, such as Mobile QQ, Weishi, Pitu, etc. Contributions are welcome to work in collaborative with us and make TNN a better framework.

Language: C++ - Size: 56 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 4,505 - Forks: 772

jomtek/LazenLang

An imperative, object-oriented, static and type-infered programming language.

Language: C# - Size: 3.47 MB - Last synced at: 2 days ago - Pushed at: about 2 years ago - Stars: 9 - Forks: 1

efeslab/Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Language: Cuda - Size: 32.4 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 805 - Forks: 35

AutoGPTQ/AutoGPTQ 📦

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language: Python - Size: 8.01 MB - Last synced at: 2 days ago - Pushed at: about 1 month ago - Stars: 4,837 - Forks: 513

KrasnitzLab/RAIDS

Accurate and robust inference of genetic ancestry from cancer-derived molecular data across genomic platforms

Language: R - Size: 9.44 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 5 - Forks: 4

nvidia-holoscan/holohub

Central repository for Holoscan Reference Applications

Language: C++ - Size: 268 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 133 - Forks: 89

pykeio/ort

Fast ML inference & training for ONNX models in Rust

Language: Rust - Size: 6.47 MB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 1,288 - Forks: 132

RayFernando1337/LLM-Calc

Instantly calculate the maximum size of quantized language models that can fit in your available RAM, helping you optimize your models for inference.

Language: TypeScript - Size: 190 KB - Last synced at: 3 days ago - Pushed at: 22 days ago - Stars: 224 - Forks: 12

mtrimolet/hroza

A C++ implementation of MarkovJunior based on StormKit

Language: C++ - Size: 363 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

google-ai-edge/mediapipe

Cross-platform, customizable ML solutions for live and streaming media.

Language: C++ - Size: 576 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 29,601 - Forks: 5,339

NexusGPU/tensor-fusion

Tensor Fusion is a state-of-the-art GPU virtualization and pooling solution designed to optimize GPU cluster utilization to its fullest potential.

Language: Go - Size: 795 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 31 - Forks: 8

szymonmaszke/torchlayers

Shape and dimension inference (Keras-like) for PyTorch layers and neural networks

Language: Python - Size: 3.19 MB - Last synced at: 2 days ago - Pushed at: almost 3 years ago - Stars: 570 - Forks: 44

bytedance/lightseq

LightSeq: A High Performance Library for Sequence Processing and Generation

Language: C++ - Size: 11.9 MB - Last synced at: about 20 hours ago - Pushed at: almost 2 years ago - Stars: 3,271 - Forks: 331

keith2018/TinyGPT

Tiny C++11 GPT-2 inference implementation from scratch

Language: C++ - Size: 648 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 58 - Forks: 11

sevagh/demucs.onnx

C++ ONNX/ORT inference for Demucs

Language: Python - Size: 159 KB - Last synced at: 1 day ago - Pushed at: 2 months ago - Stars: 12 - Forks: 3

BerkeleyLab/fiats

A deep learning library for use in high-performance computing applications in modern Fortran

Language: Fortran - Size: 66.4 MB - Last synced at: 1 day ago - Pushed at: 4 days ago - Stars: 42 - Forks: 11

quic/ai-hub-models

The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

Language: Python - Size: 255 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 682 - Forks: 107

MeowMeowSE3/language-detection-ai

Detect 18+ languages instantly using machine learning (BERT, LSTM, SVM) and NLP. Includes a Flask web app for real-time predictions, trained models, and detailed notebooks.

Language: Jupyter Notebook - Size: 1.76 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 1

roboflow/inference

Turn any computer or edge device into a command center for your computer vision projects.

Language: Python - Size: 124 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,658 - Forks: 176

AI-Hypercomputer/JetStream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

Language: Python - Size: 6.35 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 322 - Forks: 39

aws/sagemaker-inference-toolkit

Serve machine learning models within a 🐳 Docker container using 🧠 Amazon SageMaker.

Language: Python - Size: 667 KB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 405 - Forks: 82

pytorch/ao

PyTorch native quantization and sparsity for training and inference

Language: Python - Size: 30.5 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2,020 - Forks: 257

stas00/ml-engineering

Machine Learning Engineering Open Book

Language: Python - Size: 10.2 MB - Last synced at: 4 days ago - Pushed at: 10 days ago - Stars: 13,626 - Forks: 822

roryclear/clearcam

IP Camera with AI object detection. Currently for iOS only.

Language: Objective-C - Size: 10.9 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

triton-inference-server/onnxruntime_backend

The Triton backend for the ONNX Runtime.

Language: C++ - Size: 296 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 144 - Forks: 64

mpes-kit/fuller

Probabilistic machine learning for reconstruction and parametrization of electronic band sturcture from photoemission spectroscopy data

Language: Jupyter Notebook - Size: 25.9 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 13 - Forks: 2

gvergnaud/ts-pattern

🎨 The exhaustive Pattern Matching library for TypeScript, with smart type inference.

Language: TypeScript - Size: 2.7 MB - Last synced at: 4 days ago - Pushed at: 10 days ago - Stars: 13,487 - Forks: 147

superduper-io/superduper

Superduper: End-to-end framework for building custom AI applications and agents.

Language: Python - Size: 73.6 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 5,053 - Forks: 493

jy-yuan/KIVI

[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Language: Python - Size: 16.7 MB - Last synced at: 4 days ago - Pushed at: 4 months ago - Stars: 295 - Forks: 30

ronniross/symbiotic-core-library

Toolkits, instructions, prompts, bibliographies, and research support designed to enhance/test LLM metacognitive/contextual awareness, address deficiencies, and unlock emergent properties/human-AI symbiosis.

Size: 8 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 6 - Forks: 0

rebellions-sw/optimum-rbln

⚡ A seamless integration of HuggingFace Transformers & Diffusers with RBLN SDK for efficient inference on RBLN NPUs.

Language: Python - Size: 1.12 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 9 - Forks: 1

interestingLSY/swiftLLM

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Language: Python - Size: 224 KB - Last synced at: about 19 hours ago - Pushed at: 12 days ago - Stars: 171 - Forks: 17

huggingface/optimum

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

Language: Python - Size: 5.64 MB - Last synced at: 4 days ago - Pushed at: 11 days ago - Stars: 2,878 - Forks: 532

triton-inference-server/server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Language: Python - Size: 35.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 9,170 - Forks: 1,569

gcanti/io-ts

Runtime type system for IO decoding/encoding

Language: TypeScript - Size: 2.9 MB - Last synced at: 4 days ago - Pushed at: 5 months ago - Stars: 6,773 - Forks: 328

Torsion-Audio/nn-inference-template

Neural network inference template for real-time cricital audio environments - presented at ADC23

Language: C++ - Size: 18.6 MB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 109 - Forks: 5

intel/xFasterTransformer

Language: C++ - Size: 52.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 422 - Forks: 67

bigai-nlco/TokenSwift

From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation

Language: Python - Size: 61.6 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 89 - Forks: 8

OpenNMT/CTranslate2

Fast inference engine for Transformer models

Language: C++ - Size: 14.5 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 3,785 - Forks: 354

google/XNNPACK

High-efficiency floating-point neural network inference operators for mobile, server, and Web

Language: C - Size: 166 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 2,008 - Forks: 411

lofcz/LlmTornado

The .NET library to consume 100+ APIs: OpenAI, Anthropic, Google, DeepSeek, Cohere, Mistral, Azure, xAI, Perplexity, Groq, Ollama, vLLM, and many more!

Language: C# - Size: 15.6 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 150 - Forks: 22

Trusted-AI/adversarial-robustness-toolbox

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

Language: Python - Size: 610 MB - Last synced at: 5 days ago - Pushed at: 7 days ago - Stars: 5,234 - Forks: 1,210

llmariner/llmariner

Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs.

Language: Go - Size: 7.87 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 67 - Forks: 5

NVIDIA/kvpress

LLM KV cache compression made easy

Language: Python - Size: 5.55 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 472 - Forks: 36

DanielHermosilla/ecological-inference-elections Fork of pabloubilla/ecological-inference-elections

R library for the work by Thraves, C. and Ubilla, P.: 'Fast Ecological Inference Algorithm for the R×C Case'

Language: HTML - Size: 442 MB - Last synced at: about 22 hours ago - Pushed at: about 22 hours ago - Stars: 0 - Forks: 0

pipeless-ai/pipeless

An open-source computer vision framework to build and deploy apps in minutes

Language: Rust - Size: 142 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 752 - Forks: 38

decs/typeschema

🛵 Universal adapter for TypeScript schema validation.

Language: TypeScript - Size: 1.67 MB - Last synced at: 5 days ago - Pushed at: 7 months ago - Stars: 443 - Forks: 14

tairov/llama2.py Fork of karpathy/llama2.c

Inference Llama 2 in one file of pure Python

Language: Python - Size: 6.29 MB - Last synced at: about 20 hours ago - Pushed at: 7 months ago - Stars: 415 - Forks: 28

aws/studio-lab-examples

Example notebooks for working with SageMaker Studio Lab. Sign up for an account at the link below!

Language: Jupyter Notebook - Size: 33.9 MB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 713 - Forks: 207

inferx-net/inferx

InferX is a Inference Function as a Service Platform

Language: Rust - Size: 1.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 61 - Forks: 3

NextGenContributions/django2pydantic

Django2pydantic is the most complete library for converting Django ORM models to Pydantic models

Language: Python - Size: 540 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 3 - Forks: 1

ostis-apps/ostis-discrete-math

Intelligent help system for Discrete Math

Language: Shell - Size: 3.43 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 6 - Forks: 7

OneInterface/realtime-bakllava

llama.cpp with BakLLaVA model describes what does it see

Language: Python - Size: 2.84 MB - Last synced at: about 19 hours ago - Pushed at: over 1 year ago - Stars: 383 - Forks: 42

vectorch-ai/ScaleLLM

A high-performance inference system for large language models, designed for production environments.

Language: C++ - Size: 19 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 436 - Forks: 35

aws/amazon-sagemaker-examples

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.

Language: Jupyter Notebook - Size: 634 MB - Last synced at: 5 days ago - Pushed at: about 2 months ago - Stars: 10,483 - Forks: 6,884

itlab-vision/dl-benchmark

Deep Learning Inference benchmark. Supports OpenVINO™ toolkit, TensorFlow, TensorFlow Lite, ONNX Runtime, OpenCV DNN, MXNet, PyTorch, Apache TVM, ncnn, PaddlePaddle, etc.

Language: HTML - Size: 141 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 32 - Forks: 38

typedb/typeql

TypeQL: the power of programming, in your database

Language: Java - Size: 6.27 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 227 - Forks: 46

BasLinders/happyhorizon_statstoolkit

An ongoing project for an online toolkit to analyze online controlled experiments. Its mission: To make inferential statistics accessible for everyone.

Language: Python - Size: 382 KB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

open-edge-platform/geti

Build computer vision models in a fraction of the time and with less data.

Language: TypeScript - Size: 65.7 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 160 - Forks: 12

Twixie5/OpenVINO_Asynchronous_API_Performance_Demo

This project demonstrates the high performance of OpenVINO asynchronous inference API

Language: Python - Size: 28.5 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

amiot99/gpu-inference-microservice

FastAPI-based microservice for GPU-accelerated image classification using ResNet18 and Docker

Language: Python - Size: 9.77 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

OpenCSGs/csghub

CSGHub is a brand-new open-source platform for managing LLMs, developed by the OpenCSG team. It offers both open-source and on-premise/SaaS solutions, with features comparable to Hugging Face. Gain full control over the lifecycle of LLMs, datasets, and agents, with Python SDK compatibility with Hugging Face. Join us! ⭐️

Language: Vue - Size: 49.9 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2,710 - Forks: 388

typedb/typedb

TypeDB: the power of programming, in your database

Language: Rust - Size: 104 MB - Last synced at: 5 days ago - Pushed at: 9 days ago - Stars: 3,973 - Forks: 344

rupeshtr78/rag-agent-rust

CLI-based LanceDB vector embedding with LLM integration for RAG workflows

Language: Rust - Size: 1.17 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

openvinotoolkit/openvino

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

Language: C++ - Size: 845 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 8,241 - Forks: 2,590

intel/ai-reference-models

Intel® AI Reference Models: contains Intel optimizations for running deep learning workloads on Intel® Xeon® Scalable processors and Intel® Data Center GPUs

Language: Python - Size: 621 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 711 - Forks: 225

xorbitsai/inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

Language: Python - Size: 44.9 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 7,722 - Forks: 657

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

Language: Python - Size: 17.9 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 13,942 - Forks: 1,659

SYSTRAN/faster-whisper

Faster Whisper transcription with CTranslate2

Language: Python - Size: 36.6 MB - Last synced at: 6 days ago - Pushed at: 12 days ago - Stars: 15,802 - Forks: 1,316

zml/zml

Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild

Language: Zig - Size: 2.14 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 2,243 - Forks: 80

hpcaitech/ColossalAI

Making large AI models cheaper, faster and more accessible

Language: Python - Size: 62.8 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 40,845 - Forks: 4,499

ggml-org/whisper.cpp

Port of OpenAI's Whisper model in C/C++

Language: C++ - Size: 20.9 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 39,736 - Forks: 4,184

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language: Python - Size: 47.6 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 46,555 - Forks: 7,236

ntua-unit-of-control-and-informatics/jaqpot-frontend

The Jaqpot project's frontend app serves as the interactive gateway for users to engage with our predictive modeling platform.

Language: TypeScript - Size: 5.17 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

NVIDIA/TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

Language: C++ - Size: 130 MB - Last synced at: 6 days ago - Pushed at: 9 days ago - Stars: 11,545 - Forks: 2,185

ntua-unit-of-control-and-informatics/jaqpotpy

Open-source Python client for deploying machine learning models and obtaining predictions via the Jaqpot API.

Language: Python - Size: 362 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 17 - Forks: 1

openvinotoolkit/openvino_notebooks

📚 Jupyter notebook tutorials for OpenVINO™

Language: Jupyter Notebook - Size: 2.49 GB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2,768 - Forks: 884

huggingface/text-generation-inference

Large Language Model Text Generation Inference

Language: Python - Size: 13.1 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 10,081 - Forks: 1,189

Tencent/ncnn

ncnn is a high-performance neural network inference framework optimized for the mobile platform

Language: C++ - Size: 26.4 MB - Last synced at: 6 days ago - Pushed at: 8 days ago - Stars: 21,405 - Forks: 4,246

dlstreamer/dlstreamer

This repository is a home to Intel® Deep Learning Streamer (Intel® DL Streamer) Pipeline Framework. Pipeline Framework is a streaming media analytics framework, based on GStreamer* multimedia framework, for creating complex media analytics pipelines.

Language: C++ - Size: 9.39 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 550 - Forks: 173

FocoosAI/focoos

Focoos SDK

Language: Python - Size: 11 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 89 - Forks: 0