GitHub topics: inference
actualwitch/experiment
🔬 Experiment is a professional-grade chat interface for Large Language Models (LLMs) designed for developers, prompt engineers, and AI researchers. It provides a streamlined environment for working with Anthropic, OpenAI, and Mistral models, with powerful debugging tools for prompt engineering and tool integration.
Language: TypeScript - Size: 18.8 MB - Last synced at: about 4 hours ago - Pushed at: about 5 hours ago - Stars: 10 - Forks: 1

pgmpy/pgmpy
Python Library for learning (Structure and Parameter), inference (Probabilistic and Causal), and simulations in Bayesian Networks.
Language: Python - Size: 12.7 MB - Last synced at: about 10 hours ago - Pushed at: about 10 hours ago - Stars: 2,868 - Forks: 731

aws/amazon-sagemaker-examples
Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
Language: Jupyter Notebook - Size: 634 MB - Last synced at: about 16 hours ago - Pushed at: about 1 month ago - Stars: 10,449 - Forks: 6,868

pytorch/ao
PyTorch native quantization and sparsity for training and inference
Language: Python - Size: 29.3 MB - Last synced at: about 21 hours ago - Pushed at: about 22 hours ago - Stars: 1,973 - Forks: 245

rupeshtr78/rag-agent-rust
LanceDB Vector embeddings using cli and integrating LLM models for Retrieval-Augmented Generation (RAG) workflows for data storage, retrieval, and AI-driven chat.
Language: Rust - Size: 688 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

RayFernando1337/LLM-Calc
Instantly calculate the maximum size of quantized language models that can fit in your available RAM, helping you optimize your models for inference.
Language: TypeScript - Size: 190 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 217 - Forks: 12

google-ai-edge/mediapipe
Cross-platform, customizable ML solutions for live and streaming media.
Language: C++ - Size: 575 MB - Last synced at: 1 day ago - Pushed at: 3 days ago - Stars: 29,429 - Forks: 5,319

13Mai13/llm-server
This is an implementation of an LLM server
Language: Python - Size: 154 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

NOOBKABHA/bulk-chain-shell
Shell client 📺 for shema-based reasoning 🧠 over your data via custom LLM provider 🌌
Language: Python - Size: 20.5 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

aws/sagemaker-xgboost-container
This is the Docker container based on open source framework XGBoost (https://xgboost.readthedocs.io/en/latest/) to allow customers use their own XGBoost scripts in SageMaker.
Language: Python - Size: 830 KB - Last synced at: about 16 hours ago - Pushed at: about 1 month ago - Stars: 136 - Forks: 84

huggingface/optimum
🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
Language: Python - Size: 5.32 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 2,855 - Forks: 521

argmaxinc/WhisperKit
On-device Speech Recognition for Apple Silicon
Language: Swift - Size: 2.57 MB - Last synced at: 1 day ago - Pushed at: 5 days ago - Stars: 4,519 - Forks: 381

ggml-org/whisper.cpp
Port of OpenAI's Whisper model in C/C++
Language: C++ - Size: 20 MB - Last synced at: 1 day ago - Pushed at: 4 days ago - Stars: 39,337 - Forks: 4,125

lofcz/LlmTornado
The .NET library to consume 100+ APIs: OpenAI, Anthropic, Google, DeepSeek, Cohere, Mistral, Azure, xAI, Perplexity, Groq, Ollama, LocalAi, and many more!
Language: C# - Size: 15.4 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 130 - Forks: 21

microsoft/vidur
A large-scale simulation framework for LLM inference
Language: Python - Size: 156 MB - Last synced at: about 16 hours ago - Pushed at: 5 months ago - Stars: 364 - Forks: 65

Fuzzy-Search/realtime-bakllava
llama.cpp with BakLLaVA model describes what does it see
Language: Python - Size: 2.84 MB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 383 - Forks: 43

huggingface/huggingface.js
Utilities to use the Hugging Face Hub API
Language: TypeScript - Size: 15 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1,587 - Forks: 349

triton-inference-server/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Language: Python - Size: 37.3 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 9,087 - Forks: 1,558

superduper-io/superduper
Superduper: End-to-end framework for building custom AI applications and agents.
Language: Python - Size: 73.2 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 5,032 - Forks: 492

huggingface/text-generation-inference
Large Language Model Text Generation Inference
Language: Python - Size: 12.9 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 10,028 - Forks: 1,185

willjgh/M5R
Extension of M4R work by application of methods to perturbed datasets
Language: Jupyter Notebook - Size: 61.3 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 2 - Forks: 0

prodialabs/prodia-js
Official TypeScript library for Prodia's AI inference API.
Language: TypeScript - Size: 213 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 23 - Forks: 12

DepressionCenter/Data-and-Design-Core
Code developed by the EFDC Data and Design Core team to support mental health research.
Language: Stata - Size: 639 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 2 - Forks: 0

DanielHermosilla/ecological-inference-elections Fork of pabloubilla/ecological-inference-elections
R library for the work by Thraves, C. and Ubilla, P.: 'Fast Ecological Inference Algorithm for the R×C Case'
Language: C - Size: 439 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

openvinotoolkit/openvino_notebooks
📚 Jupyter notebook tutorials for OpenVINO™
Language: Jupyter Notebook - Size: 2.4 GB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 2,728 - Forks: 879

microsoft/aici
AICI: Prompts as (Wasm) Programs
Language: Rust - Size: 9.71 MB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 2,017 - Forks: 83

ronniross/symbioticcorelibrary
The Symbiotic Core Library repository provides instructions, prompts, bibliographies, and research support designed to enhance/test LLM metacognitive/contextual awareness, address deficiencies, and unlock emergent properties/human-AI symbiosis.
Size: 5.98 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 2 - Forks: 0

microsoft/nn-Meter
A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.
Language: Python - Size: 58.3 MB - Last synced at: about 16 hours ago - Pushed at: 9 months ago - Stars: 351 - Forks: 62

qubvel/transformers-notebooks
Inference and fine-tuning examples for vision models from 🤗 Transformers
Language: Jupyter Notebook - Size: 33.1 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 76 - Forks: 13

mirecl/catboost-cgo
CatBoost a fast, scalable, high performance Gradient Boosting on Decision Trees library. Golang using Cgo for blazing fast inference CatBoost Model 🚀
Language: C - Size: 793 KB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 13 - Forks: 1

hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
Language: Python - Size: 62.7 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 40,788 - Forks: 4,495

openvinotoolkit/openvino
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
Language: C++ - Size: 844 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 8,132 - Forks: 2,573

sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
Language: Python - Size: 14.2 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 13,316 - Forks: 1,538

dstackai/dstack
dstack is an open-source alternative to Kubernetes and Slurm, designed to simplify GPU allocation and AI workload orchestration for ML teams across top clouds, on-prem clusters, and accelerators.
Language: Python - Size: 114 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1,756 - Forks: 169

lovelynewlife/oceanbase Fork of oceanbase/oceanbase
To Support Machine Learning Prediction Query by leveraging Python UDF in oceanbase.
Language: C++ - Size: 431 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 5 - Forks: 4

theodo-group/GenossGPT
One API for all LLMs either Private or Public (Anthropic, Llama V2, GPT 3.5/4, Vertex, GPT4ALL, HuggingFace ...) 🌈🐂 Replace OpenAI GPT with any LLMs in your app with one line.
Language: Python - Size: 2.96 MB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 754 - Forks: 62

huggingface/optimum-intel
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
Language: Jupyter Notebook - Size: 17 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 459 - Forks: 128

rupeshtr78/ray-gpu-kind-cluster
GPU-Passthrough Enabled Apache Ray Clusters on Kubernetes (Kind) for POC and Testing
Language: Shell - Size: 175 KB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

friendliai/friendli-client
Friendli: the fastest serving engine for generative AI
Language: Python - Size: 4.88 MB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 44 - Forks: 8

Trusted-AI/adversarial-robustness-toolbox
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
Language: Python - Size: 610 MB - Last synced at: 3 days ago - Pushed at: 18 days ago - Stars: 5,198 - Forks: 1,209

deepjavalibrary/djl-serving
A universal scalable machine learning model deployment solution
Language: Java - Size: 9.96 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 213 - Forks: 71

nvidia-holoscan/holohub
Central repository for Holoscan Reference Applications
Language: C++ - Size: 256 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 132 - Forks: 86

kserve/website
User documentation for KServe.
Language: HTML - Size: 119 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 106 - Forks: 135

cintia-shinoda/univesp
Data Science Undergrad Notes, Code, and Homeworks
Language: Jupyter Notebook - Size: 57 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

aime-team/aime-api-worker-interface
AIME API Worker Interface - Interface to connect compute workers to the AIME API Server
Language: Python - Size: 5.25 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2 - Forks: 0

guoriyue/warp-llama3-scratch
Language: Python - Size: 19.5 KB - Last synced at: 1 day ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

intel/ai-reference-models
Intel® AI Reference Models: contains Intel optimizations for running deep learning workloads on Intel® Xeon® Scalable processors and Intel® Data Center GPUs
Language: Python - Size: 621 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 706 - Forks: 224

alibaba/rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
Language: C++ - Size: 274 MB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 702 - Forks: 56

BasLinders/happyhorizon_statstoolkit
An ongoing project for an online toolkit to analyze online controlled experiments. Its mission: To make inferential statistics accessible for everyone.
Language: Python - Size: 373 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

awslabs/multi-model-server
Multi Model Server is a tool for serving neural net models for inference
Language: Java - Size: 36.9 MB - Last synced at: 3 days ago - Pushed at: 11 months ago - Stars: 1,009 - Forks: 232

SYSTRAN/faster-whisper
Faster Whisper transcription with CTranslate2
Language: Python - Size: 36.6 MB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 15,467 - Forks: 1,299

hitz-zentroa/GoLLIE
Guideline following Large Language Model for Information Extraction
Language: Python - Size: 10.8 MB - Last synced at: 1 day ago - Pushed at: 6 months ago - Stars: 365 - Forks: 26

zeitlings/alfred-ollama
Dehydrated Ollama CLI Interface
Language: Swift - Size: 2.82 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 83 - Forks: 4

vllm-project/vllm-ascend
Community maintained hardware plugin for vLLM on Ascend
Language: Python - Size: 911 KB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 477 - Forks: 87

OpenCSGs/csghub
CSGHub is a brand-new open-source platform for managing LLMs, developed by the OpenCSG team. It offers both open-source and on-premise/SaaS solutions, with features comparable to Hugging Face. Gain full control over the lifecycle of LLMs, datasets, and agents, with Python SDK compatibility with Hugging Face. Join us! ⭐️
Language: Vue - Size: 49.8 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 2,659 - Forks: 388

aws-samples/aws-do-eks
Create, List, Update, Delete Amazon EKS clusters. Deploy and manage software on EKS. Run distributed model training and inference examples.
Language: Shell - Size: 26.3 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 55 - Forks: 32

EmbeddedLLM/vllm Fork of vllm-project/vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
Language: Python - Size: 44.3 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 86 - Forks: 5

ntua-unit-of-control-and-informatics/jaqpot-frontend
The Jaqpot project's frontend app serves as the interactive gateway for users to engage with our predictive modeling platform.
Language: TypeScript - Size: 5.15 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

typedb/typedb
TypeDB: the power of programming, in your database
Language: Rust - Size: 103 MB - Last synced at: 3 days ago - Pushed at: 5 days ago - Stars: 3,968 - Forks: 343

OpenIntroStat/ims
📚 Introduction to Modern Statistics - A college-level open-source textbook with a modern approach highlighting multivariable relationships and simulation-based inference. For v1, see https://openintro-ims.netlify.app.
Language: JavaScript - Size: 535 MB - Last synced at: 3 days ago - Pushed at: 3 months ago - Stars: 889 - Forks: 180

AI-Hypercomputer/JetStream
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
Language: Python - Size: 6.24 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 315 - Forks: 38

aws/studio-lab-examples
Example notebooks for working with SageMaker Studio Lab. Sign up for an account at the link below!
Language: Jupyter Notebook - Size: 33.9 MB - Last synced at: about 16 hours ago - Pushed at: 8 months ago - Stars: 708 - Forks: 204

krishnaura45/LMBattle
Predicting Human Preferences in the Wild
Language: Jupyter Notebook - Size: 29 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

MeowMeowSE3/language-detection-ai
Detect 18+ languages instantly using machine learning (BERT, LSTM, SVM) and NLP. Includes a Flask web app for real-time predictions, trained models, and detailed notebooks.
Language: Jupyter Notebook - Size: 1.76 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 1

ntua-unit-of-control-and-informatics/jaqpot-api
Open-source RESTful API for managing and deploying models, providing predictions via API, built with Spring Boot and Kotlin.
Language: Kotlin - Size: 641 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

fastmachinelearning/qonnx
QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX
Language: Python - Size: 5.31 MB - Last synced at: about 7 hours ago - Pushed at: about 8 hours ago - Stars: 145 - Forks: 44

ntua-unit-of-control-and-informatics/jaqpotpy
Open-source Python client for deploying models and obtaining predictions via the Jaqpot API.
Language: Python - Size: 362 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 17 - Forks: 1

pykeio/ort
Fast ML inference & training for ONNX models in Rust
Language: Rust - Size: 4.86 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 1,254 - Forks: 124

scitix/arks
Arks is a cloud-native inference framework running on Kubernetes
Language: Go - Size: 353 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 4 - Forks: 2

mtrimolet/hroza
A (modern) C++ implementation of MarkovJunior based on StormKit
Language: C++ - Size: 323 KB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

chengzeyi/ParaAttention
https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching
Language: Python - Size: 13.4 MB - Last synced at: 5 days ago - Pushed at: 19 days ago - Stars: 241 - Forks: 24

inlab-geo/espresso
Earth Science PRoblems for the Evaluation of Strategies, Solvers and Optimizers
Language: Jupyter Notebook - Size: 165 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 16 - Forks: 9

NexusGPU/tensor-fusion
Tensor Fusion is a state-of-the-art GPU virtualization and pooling solution designed to optimize GPU cluster utilization to its fullest potential.
Language: Go - Size: 829 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 27 - Forks: 7

jy-yuan/KIVI
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
Language: Python - Size: 16.7 MB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 288 - Forks: 29

xorbitsai/inference
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
Language: Python - Size: 44.6 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 7,500 - Forks: 636

AnswerDotAI/cold-compress
Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.
Language: Python - Size: 8.33 MB - Last synced at: 20 minutes ago - Pushed at: 8 months ago - Stars: 128 - Forks: 11

interestingLSY/swiftLLM
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).
Language: Python - Size: 298 KB - Last synced at: 1 day ago - Pushed at: 10 months ago - Stars: 160 - Forks: 14

orcasound/aifororcas-livesystem
Real-time AI-assisted killer whale notification system (model and moderator portal) :star:
Language: C# - Size: 189 MB - Last synced at: 3 days ago - Pushed at: 20 days ago - Stars: 40 - Forks: 25

superagent-ai/super-rag
Super performant RAG pipelines for AI apps. Summarization, Retrieve/Rerank and Code Interpreters in one simple API.
Language: Python - Size: 637 KB - Last synced at: 3 days ago - Pushed at: 12 months ago - Stars: 368 - Forks: 56

google/XNNPACK
High-efficiency floating-point neural network inference operators for mobile, server, and Web
Language: C - Size: 162 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 2,000 - Forks: 409

dobriban/Principles-of-AI-LLMs
Materials for the course Principles of AI: LLMs at UPenn (Stat 9911, Spring 2025). LLM architectures, training paradigms (pre- and post-training, alignment), test-time computation, reasoning, safety and robustness (jailbreaking, oversight, uncertainty), representations, interpretability (circuits), etc.
Size: 157 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 30 - Forks: 0

Lyrcaxis/KokoroSharp
Fast local TTS inference engine in C# with ONNX runtime. Multi-speaker, multi-platform and multilingual. Integrate on your .NET projects using a plug-and-play NuGet package, complete with all voices.
Language: C# - Size: 159 KB - Last synced at: 5 days ago - Pushed at: 28 days ago - Stars: 105 - Forks: 4

NVIDIA/TensorRT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Language: C++ - Size: 130 MB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 11,459 - Forks: 2,181

openvinotoolkit/model_server
A scalable inference server for models optimized with OpenVINO™
Language: C++ - Size: 53.5 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 722 - Forks: 216

vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language: Python - Size: 43.8 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 44,885 - Forks: 6,865

deepspeedai/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language: Python - Size: 216 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 37,927 - Forks: 4,336

dlstreamer/dlstreamer
This repository is a home to Intel® Deep Learning Streamer (Intel® DL Streamer) Pipeline Framework. Pipeline Framework is a streaming media analytics framework, based on GStreamer* multimedia framework, for creating complex media analytics pipelines.
Language: C++ - Size: 9.37 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 548 - Forks: 174

FocoosAI/focoos
Focoos SDK
Language: Python - Size: 10.5 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 88 - Forks: 0

Tencent/ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Language: C++ - Size: 26.3 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 21,298 - Forks: 4,234

aws/sagemaker-inference-toolkit
Serve machine learning models within a 🐳 Docker container using 🧠 Amazon SageMaker.
Language: Python - Size: 667 KB - Last synced at: about 16 hours ago - Pushed at: over 1 year ago - Stars: 403 - Forks: 82

rebellions-sw/optimum-rbln
⚡ A seamless integration of HuggingFace Transformers & Diffusers with RBLN SDK for efficient inference on RBLN NPUs.
Language: Python - Size: 1.14 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 8 - Forks: 1

Telosnex/fonnx
ONNX runtime for Flutter.
Language: Dart - Size: 176 MB - Last synced at: 3 days ago - Pushed at: 2 months ago - Stars: 264 - Forks: 18

Twixie5/OpenVINO_Asynchronous_API_Performance_Demo
This project demonstrates the high performance of OpenVINO asynchronous inference API
Language: Python - Size: 28.5 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

deepspeedai/DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Language: Python - Size: 6.52 MB - Last synced at: 1 day ago - Pushed at: 26 days ago - Stars: 2,002 - Forks: 181

hidet-org/hidet
An open-source efficient deep learning framework/compiler, written in python.
Language: Python - Size: 4.6 MB - Last synced at: 3 days ago - Pushed at: about 2 months ago - Stars: 698 - Forks: 59

ibaiGorordo/Sapiens-Pytorch-Inference
Minimal code and examnples for inferencing Sapiens foundation human models in Pytorch
Language: Jupyter Notebook - Size: 47.6 MB - Last synced at: 5 days ago - Pushed at: 6 months ago - Stars: 137 - Forks: 14

inlab-geo/cofi
Common Framework for Inference
Language: Python - Size: 164 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 17 - Forks: 5

eoap/machine-learning-process
Machine Learning Process using the EO Application Package
Language: Jupyter Notebook - Size: 124 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 1

larq/compute-engine
Highly optimized inference engine for Binarized Neural Networks
Language: C++ - Size: 4.96 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 249 - Forks: 36

torchpipe/torchpipe
Serving Inside Pytorch
Language: C++ - Size: 41.4 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 160 - Forks: 13
