An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: inference

actualwitch/experiment

🔬 Experiment is a professional-grade chat interface for Large Language Models (LLMs) designed for developers, prompt engineers, and AI researchers. It provides a streamlined environment for working with Anthropic, OpenAI, and Mistral models, with powerful debugging tools for prompt engineering and tool integration.

Language: TypeScript - Size: 18.8 MB - Last synced at: about 4 hours ago - Pushed at: about 5 hours ago - Stars: 10 - Forks: 1

pgmpy/pgmpy

Python Library for learning (Structure and Parameter), inference (Probabilistic and Causal), and simulations in Bayesian Networks.

Language: Python - Size: 12.7 MB - Last synced at: about 10 hours ago - Pushed at: about 10 hours ago - Stars: 2,868 - Forks: 731

aws/amazon-sagemaker-examples

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.

Language: Jupyter Notebook - Size: 634 MB - Last synced at: about 16 hours ago - Pushed at: about 1 month ago - Stars: 10,449 - Forks: 6,868

pytorch/ao

PyTorch native quantization and sparsity for training and inference

Language: Python - Size: 29.3 MB - Last synced at: about 21 hours ago - Pushed at: about 22 hours ago - Stars: 1,973 - Forks: 245

rupeshtr78/rag-agent-rust

LanceDB Vector embeddings using cli and integrating LLM models for Retrieval-Augmented Generation (RAG) workflows for data storage, retrieval, and AI-driven chat.

Language: Rust - Size: 688 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

RayFernando1337/LLM-Calc

Instantly calculate the maximum size of quantized language models that can fit in your available RAM, helping you optimize your models for inference.

Language: TypeScript - Size: 190 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 217 - Forks: 12

google-ai-edge/mediapipe

Cross-platform, customizable ML solutions for live and streaming media.

Language: C++ - Size: 575 MB - Last synced at: 1 day ago - Pushed at: 3 days ago - Stars: 29,429 - Forks: 5,319

13Mai13/llm-server

This is an implementation of an LLM server

Language: Python - Size: 154 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

NOOBKABHA/bulk-chain-shell

Shell client 📺 for shema-based reasoning 🧠 over your data via custom LLM provider 🌌

Language: Python - Size: 20.5 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

aws/sagemaker-xgboost-container

This is the Docker container based on open source framework XGBoost (https://xgboost.readthedocs.io/en/latest/) to allow customers use their own XGBoost scripts in SageMaker.

Language: Python - Size: 830 KB - Last synced at: about 16 hours ago - Pushed at: about 1 month ago - Stars: 136 - Forks: 84

huggingface/optimum

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

Language: Python - Size: 5.32 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 2,855 - Forks: 521

argmaxinc/WhisperKit

On-device Speech Recognition for Apple Silicon

Language: Swift - Size: 2.57 MB - Last synced at: 1 day ago - Pushed at: 5 days ago - Stars: 4,519 - Forks: 381

ggml-org/whisper.cpp

Port of OpenAI's Whisper model in C/C++

Language: C++ - Size: 20 MB - Last synced at: 1 day ago - Pushed at: 4 days ago - Stars: 39,337 - Forks: 4,125

lofcz/LlmTornado

The .NET library to consume 100+ APIs: OpenAI, Anthropic, Google, DeepSeek, Cohere, Mistral, Azure, xAI, Perplexity, Groq, Ollama, LocalAi, and many more!

Language: C# - Size: 15.4 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 130 - Forks: 21

microsoft/vidur

A large-scale simulation framework for LLM inference

Language: Python - Size: 156 MB - Last synced at: about 16 hours ago - Pushed at: 5 months ago - Stars: 364 - Forks: 65

Fuzzy-Search/realtime-bakllava

llama.cpp with BakLLaVA model describes what does it see

Language: Python - Size: 2.84 MB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 383 - Forks: 43

huggingface/huggingface.js

Utilities to use the Hugging Face Hub API

Language: TypeScript - Size: 15 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1,587 - Forks: 349

triton-inference-server/server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Language: Python - Size: 37.3 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 9,087 - Forks: 1,558

superduper-io/superduper

Superduper: End-to-end framework for building custom AI applications and agents.

Language: Python - Size: 73.2 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 5,032 - Forks: 492

huggingface/text-generation-inference

Large Language Model Text Generation Inference

Language: Python - Size: 12.9 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 10,028 - Forks: 1,185

willjgh/M5R

Extension of M4R work by application of methods to perturbed datasets

Language: Jupyter Notebook - Size: 61.3 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 2 - Forks: 0

prodialabs/prodia-js

Official TypeScript library for Prodia's AI inference API.

Language: TypeScript - Size: 213 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 23 - Forks: 12

DepressionCenter/Data-and-Design-Core

Code developed by the EFDC Data and Design Core team to support mental health research.

Language: Stata - Size: 639 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 2 - Forks: 0

DanielHermosilla/ecological-inference-elections Fork of pabloubilla/ecological-inference-elections

R library for the work by Thraves, C. and Ubilla, P.: 'Fast Ecological Inference Algorithm for the R×C Case'

Language: C - Size: 439 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

openvinotoolkit/openvino_notebooks

📚 Jupyter notebook tutorials for OpenVINO™

Language: Jupyter Notebook - Size: 2.4 GB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 2,728 - Forks: 879

microsoft/aici

AICI: Prompts as (Wasm) Programs

Language: Rust - Size: 9.71 MB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 2,017 - Forks: 83

ronniross/symbioticcorelibrary

The Symbiotic Core Library repository provides instructions, prompts, bibliographies, and research support designed to enhance/test LLM metacognitive/contextual awareness, address deficiencies, and unlock emergent properties/human-AI symbiosis.

Size: 5.98 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 2 - Forks: 0

microsoft/nn-Meter

A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.

Language: Python - Size: 58.3 MB - Last synced at: about 16 hours ago - Pushed at: 9 months ago - Stars: 351 - Forks: 62

qubvel/transformers-notebooks

Inference and fine-tuning examples for vision models from 🤗 Transformers

Language: Jupyter Notebook - Size: 33.1 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 76 - Forks: 13

mirecl/catboost-cgo

CatBoost a fast, scalable, high performance Gradient Boosting on Decision Trees library. Golang using Cgo for blazing fast inference CatBoost Model 🚀

Language: C - Size: 793 KB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 13 - Forks: 1

hpcaitech/ColossalAI

Making large AI models cheaper, faster and more accessible

Language: Python - Size: 62.7 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 40,788 - Forks: 4,495

openvinotoolkit/openvino

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

Language: C++ - Size: 844 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 8,132 - Forks: 2,573

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

Language: Python - Size: 14.2 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 13,316 - Forks: 1,538

dstackai/dstack

dstack is an open-source alternative to Kubernetes and Slurm, designed to simplify GPU allocation and AI workload orchestration for ML teams across top clouds, on-prem clusters, and accelerators.

Language: Python - Size: 114 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1,756 - Forks: 169

lovelynewlife/oceanbase Fork of oceanbase/oceanbase

To Support Machine Learning Prediction Query by leveraging Python UDF in oceanbase.

Language: C++ - Size: 431 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 5 - Forks: 4

theodo-group/GenossGPT

One API for all LLMs either Private or Public (Anthropic, Llama V2, GPT 3.5/4, Vertex, GPT4ALL, HuggingFace ...) 🌈🐂 Replace OpenAI GPT with any LLMs in your app with one line.

Language: Python - Size: 2.96 MB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 754 - Forks: 62

huggingface/optimum-intel

🤗 Optimum Intel: Accelerate inference with Intel optimization tools

Language: Jupyter Notebook - Size: 17 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 459 - Forks: 128

rupeshtr78/ray-gpu-kind-cluster

GPU-Passthrough Enabled Apache Ray Clusters on Kubernetes (Kind) for POC and Testing

Language: Shell - Size: 175 KB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

friendliai/friendli-client

Friendli: the fastest serving engine for generative AI

Language: Python - Size: 4.88 MB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 44 - Forks: 8

Trusted-AI/adversarial-robustness-toolbox

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

Language: Python - Size: 610 MB - Last synced at: 3 days ago - Pushed at: 18 days ago - Stars: 5,198 - Forks: 1,209

deepjavalibrary/djl-serving

A universal scalable machine learning model deployment solution

Language: Java - Size: 9.96 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 213 - Forks: 71

nvidia-holoscan/holohub

Central repository for Holoscan Reference Applications

Language: C++ - Size: 256 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 132 - Forks: 86

kserve/website

User documentation for KServe.

Language: HTML - Size: 119 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 106 - Forks: 135

cintia-shinoda/univesp

Data Science Undergrad Notes, Code, and Homeworks

Language: Jupyter Notebook - Size: 57 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

aime-team/aime-api-worker-interface

AIME API Worker Interface - Interface to connect compute workers to the AIME API Server

Language: Python - Size: 5.25 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2 - Forks: 0

guoriyue/warp-llama3-scratch

Language: Python - Size: 19.5 KB - Last synced at: 1 day ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

intel/ai-reference-models

Intel® AI Reference Models: contains Intel optimizations for running deep learning workloads on Intel® Xeon® Scalable processors and Intel® Data Center GPUs

Language: Python - Size: 621 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 706 - Forks: 224

alibaba/rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

Language: C++ - Size: 274 MB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 702 - Forks: 56

BasLinders/happyhorizon_statstoolkit

An ongoing project for an online toolkit to analyze online controlled experiments. Its mission: To make inferential statistics accessible for everyone.

Language: Python - Size: 373 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

awslabs/multi-model-server

Multi Model Server is a tool for serving neural net models for inference

Language: Java - Size: 36.9 MB - Last synced at: 3 days ago - Pushed at: 11 months ago - Stars: 1,009 - Forks: 232

SYSTRAN/faster-whisper

Faster Whisper transcription with CTranslate2

Language: Python - Size: 36.6 MB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 15,467 - Forks: 1,299

hitz-zentroa/GoLLIE

Guideline following Large Language Model for Information Extraction

Language: Python - Size: 10.8 MB - Last synced at: 1 day ago - Pushed at: 6 months ago - Stars: 365 - Forks: 26

zeitlings/alfred-ollama

Dehydrated Ollama CLI Interface

Language: Swift - Size: 2.82 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 83 - Forks: 4

vllm-project/vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

Language: Python - Size: 911 KB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 477 - Forks: 87

OpenCSGs/csghub

CSGHub is a brand-new open-source platform for managing LLMs, developed by the OpenCSG team. It offers both open-source and on-premise/SaaS solutions, with features comparable to Hugging Face. Gain full control over the lifecycle of LLMs, datasets, and agents, with Python SDK compatibility with Hugging Face. Join us! ⭐️

Language: Vue - Size: 49.8 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 2,659 - Forks: 388

aws-samples/aws-do-eks

Create, List, Update, Delete Amazon EKS clusters. Deploy and manage software on EKS. Run distributed model training and inference examples.

Language: Shell - Size: 26.3 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 55 - Forks: 32

EmbeddedLLM/vllm Fork of vllm-project/vllm

vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs

Language: Python - Size: 44.3 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 86 - Forks: 5

ntua-unit-of-control-and-informatics/jaqpot-frontend

The Jaqpot project's frontend app serves as the interactive gateway for users to engage with our predictive modeling platform.

Language: TypeScript - Size: 5.15 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

typedb/typedb

TypeDB: the power of programming, in your database

Language: Rust - Size: 103 MB - Last synced at: 3 days ago - Pushed at: 5 days ago - Stars: 3,968 - Forks: 343

OpenIntroStat/ims

📚 Introduction to Modern Statistics - A college-level open-source textbook with a modern approach highlighting multivariable relationships and simulation-based inference. For v1, see https://openintro-ims.netlify.app.

Language: JavaScript - Size: 535 MB - Last synced at: 3 days ago - Pushed at: 3 months ago - Stars: 889 - Forks: 180

AI-Hypercomputer/JetStream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

Language: Python - Size: 6.24 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 315 - Forks: 38

aws/studio-lab-examples

Example notebooks for working with SageMaker Studio Lab. Sign up for an account at the link below!

Language: Jupyter Notebook - Size: 33.9 MB - Last synced at: about 16 hours ago - Pushed at: 8 months ago - Stars: 708 - Forks: 204

krishnaura45/LMBattle

Predicting Human Preferences in the Wild

Language: Jupyter Notebook - Size: 29 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

MeowMeowSE3/language-detection-ai

Detect 18+ languages instantly using machine learning (BERT, LSTM, SVM) and NLP. Includes a Flask web app for real-time predictions, trained models, and detailed notebooks.

Language: Jupyter Notebook - Size: 1.76 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 1

ntua-unit-of-control-and-informatics/jaqpot-api

Open-source RESTful API for managing and deploying models, providing predictions via API, built with Spring Boot and Kotlin.

Language: Kotlin - Size: 641 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

fastmachinelearning/qonnx

QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX

Language: Python - Size: 5.31 MB - Last synced at: about 7 hours ago - Pushed at: about 8 hours ago - Stars: 145 - Forks: 44

ntua-unit-of-control-and-informatics/jaqpotpy

Open-source Python client for deploying models and obtaining predictions via the Jaqpot API.

Language: Python - Size: 362 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 17 - Forks: 1

pykeio/ort

Fast ML inference & training for ONNX models in Rust

Language: Rust - Size: 4.86 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 1,254 - Forks: 124

scitix/arks

Arks is a cloud-native inference framework running on Kubernetes

Language: Go - Size: 353 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 4 - Forks: 2

mtrimolet/hroza

A (modern) C++ implementation of MarkovJunior based on StormKit

Language: C++ - Size: 323 KB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

chengzeyi/ParaAttention

https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching

Language: Python - Size: 13.4 MB - Last synced at: 5 days ago - Pushed at: 19 days ago - Stars: 241 - Forks: 24

inlab-geo/espresso

Earth Science PRoblems for the Evaluation of Strategies, Solvers and Optimizers

Language: Jupyter Notebook - Size: 165 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 16 - Forks: 9

NexusGPU/tensor-fusion

Tensor Fusion is a state-of-the-art GPU virtualization and pooling solution designed to optimize GPU cluster utilization to its fullest potential.

Language: Go - Size: 829 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 27 - Forks: 7

jy-yuan/KIVI

[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Language: Python - Size: 16.7 MB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 288 - Forks: 29

xorbitsai/inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

Language: Python - Size: 44.6 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 7,500 - Forks: 636

AnswerDotAI/cold-compress

Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.

Language: Python - Size: 8.33 MB - Last synced at: 20 minutes ago - Pushed at: 8 months ago - Stars: 128 - Forks: 11

interestingLSY/swiftLLM

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Language: Python - Size: 298 KB - Last synced at: 1 day ago - Pushed at: 10 months ago - Stars: 160 - Forks: 14

orcasound/aifororcas-livesystem

Real-time AI-assisted killer whale notification system (model and moderator portal) :star:

Language: C# - Size: 189 MB - Last synced at: 3 days ago - Pushed at: 20 days ago - Stars: 40 - Forks: 25

superagent-ai/super-rag

Super performant RAG pipelines for AI apps. Summarization, Retrieve/Rerank and Code Interpreters in one simple API.

Language: Python - Size: 637 KB - Last synced at: 3 days ago - Pushed at: 12 months ago - Stars: 368 - Forks: 56

google/XNNPACK

High-efficiency floating-point neural network inference operators for mobile, server, and Web

Language: C - Size: 162 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 2,000 - Forks: 409

dobriban/Principles-of-AI-LLMs

Materials for the course Principles of AI: LLMs at UPenn (Stat 9911, Spring 2025). LLM architectures, training paradigms (pre- and post-training, alignment), test-time computation, reasoning, safety and robustness (jailbreaking, oversight, uncertainty), representations, interpretability (circuits), etc.

Size: 157 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 30 - Forks: 0

Lyrcaxis/KokoroSharp

Fast local TTS inference engine in C# with ONNX runtime. Multi-speaker, multi-platform and multilingual. Integrate on your .NET projects using a plug-and-play NuGet package, complete with all voices.

Language: C# - Size: 159 KB - Last synced at: 5 days ago - Pushed at: 28 days ago - Stars: 105 - Forks: 4

NVIDIA/TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

Language: C++ - Size: 130 MB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 11,459 - Forks: 2,181

openvinotoolkit/model_server

A scalable inference server for models optimized with OpenVINO™

Language: C++ - Size: 53.5 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 722 - Forks: 216

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language: Python - Size: 43.8 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 44,885 - Forks: 6,865

deepspeedai/DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language: Python - Size: 216 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 37,927 - Forks: 4,336

dlstreamer/dlstreamer

This repository is a home to Intel® Deep Learning Streamer (Intel® DL Streamer) Pipeline Framework. Pipeline Framework is a streaming media analytics framework, based on GStreamer* multimedia framework, for creating complex media analytics pipelines.

Language: C++ - Size: 9.37 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 548 - Forks: 174

FocoosAI/focoos

Focoos SDK

Language: Python - Size: 10.5 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 88 - Forks: 0

Tencent/ncnn

ncnn is a high-performance neural network inference framework optimized for the mobile platform

Language: C++ - Size: 26.3 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 21,298 - Forks: 4,234

aws/sagemaker-inference-toolkit

Serve machine learning models within a 🐳 Docker container using 🧠 Amazon SageMaker.

Language: Python - Size: 667 KB - Last synced at: about 16 hours ago - Pushed at: over 1 year ago - Stars: 403 - Forks: 82

rebellions-sw/optimum-rbln

⚡ A seamless integration of HuggingFace Transformers & Diffusers with RBLN SDK for efficient inference on RBLN NPUs.

Language: Python - Size: 1.14 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 8 - Forks: 1

Telosnex/fonnx

ONNX runtime for Flutter.

Language: Dart - Size: 176 MB - Last synced at: 3 days ago - Pushed at: 2 months ago - Stars: 264 - Forks: 18

Twixie5/OpenVINO_Asynchronous_API_Performance_Demo

This project demonstrates the high performance of OpenVINO asynchronous inference API

Language: Python - Size: 28.5 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

deepspeedai/DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Language: Python - Size: 6.52 MB - Last synced at: 1 day ago - Pushed at: 26 days ago - Stars: 2,002 - Forks: 181

hidet-org/hidet

An open-source efficient deep learning framework/compiler, written in python.

Language: Python - Size: 4.6 MB - Last synced at: 3 days ago - Pushed at: about 2 months ago - Stars: 698 - Forks: 59

ibaiGorordo/Sapiens-Pytorch-Inference

Minimal code and examnples for inferencing Sapiens foundation human models in Pytorch

Language: Jupyter Notebook - Size: 47.6 MB - Last synced at: 5 days ago - Pushed at: 6 months ago - Stars: 137 - Forks: 14

inlab-geo/cofi

Common Framework for Inference

Language: Python - Size: 164 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 17 - Forks: 5

eoap/machine-learning-process

Machine Learning Process using the EO Application Package

Language: Jupyter Notebook - Size: 124 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 1

larq/compute-engine

Highly optimized inference engine for Binarized Neural Networks

Language: C++ - Size: 4.96 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 249 - Forks: 36

torchpipe/torchpipe

Serving Inside Pytorch

Language: C++ - Size: 41.4 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 160 - Forks: 13