GitHub topics: inference-engine

Repositories

insight-platform/Savant

Python Computer Vision & Video Analytics Framework With Batteries Included

Language: Python - Size: 45.2 MB - Last synced at: about 11 hours ago - Pushed at: about 12 hours ago - Stars: 649 - Forks: 58

agoSantiago97/gemma-2-2b-it.cs

# gemma-2-2b-it.csThis project implements int8 CPU inference in pure C#. It ports a Rust repository using Gemini 2.5 Pro Preview, and you can easily build and run it with the provided batch files. 🐙💻

Language: C# - Size: 15.6 KB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

# llama-3.2-1b.vbThis project provides a simple way to run llama 3.2 1b fp16 CPU inference using VB.NET. Follow the setup instructions to ensure all necessary files are in place for smooth operation. 🐱💻✨

Language: Visual Basic .NET - Size: 24.4 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

lofcz/LlmTornado

The .NET library to consume 100+ APIs: OpenAI, Anthropic, Google, DeepSeek, Cohere, Mistral, Azure, xAI, Perplexity, Groq, Voyage, DeepInfra, Ollama, vLLM, and many more!

Language: C# - Size: 32.7 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 176 - Forks: 23

fritzo/pomagma

An inference engine for extensional untyped λ-calculus

Language: C++ - Size: 8.98 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 3 - Forks: 0

JohnClaw/gemma-2-2b-it.cs

gemma-2-2b-it int8 cpu inference in one file of pure C#

Language: C# - Size: 16.6 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

JohnClaw/llama-3.2-1b.vb

one-file llama 3.2 1b fp16 cpu inference in pure vb.net

Language: Visual Basic .NET - Size: 24.4 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

FedML-AI/FedML

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

Language: Python - Size: 892 MB - Last synced at: 2 days ago - Pushed at: about 1 month ago - Stars: 3,874 - Forks: 750

aphrodite-engine/aphrodite-engine

Large-scale LLM inference engine

Language: C++ - Size: 36.6 MB - Last synced at: 3 days ago - Pushed at: 7 days ago - Stars: 1,446 - Forks: 157

ReactiveBayes/RxInfer.jl

Julia package for automated Bayesian inference on a factor graph with reactive message passing

Language: Jupyter Notebook - Size: 438 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 338 - Forks: 29

ROCm/MIVisionX

MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions.

Language: C++ - Size: 153 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 195 - Forks: 77

sinterwong/ai-workflow-sdk

This is a cross-platform inference SDK for AI. It supports ONNX Runtime and NCNN.

Language: C++ - Size: 625 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 3 - Forks: 0

EfficientMoE/MoE-Infinity

PyTorch library for cost-effective, fast and easy serving of MoE models.

Language: Python - Size: 592 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 197 - Forks: 17

SearchSavior/OpenArc

Lightweight Inference server for OpenVINO

Language: Python - Size: 2.33 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 186 - Forks: 5

quic/ai-hub-models

The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

Language: Python - Size: 255 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 709 - Forks: 115

sp-muramutsa/pagerank

This project simulates Google’s original web ranking algorithm by using a Markov chain-based model to assign importance scores to nodes in a network.

Language: Python - Size: 12.7 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

hyperjumptech/grule-rule-engine

Rule engine implementation in Golang

Language: Go - Size: 10.6 MB - Last synced at: 7 days ago - Pushed at: 11 months ago - Stars: 2,339 - Forks: 360

aygp-dr/genealogical-inference-engine

Bayesian genealogical relationship inference engine written in Guile Scheme for analyzing family structures from fragmentary evidence

Language: Scheme - Size: 16.6 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

interestingLSY/swiftLLM

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Language: Python - Size: 226 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 209 - Forks: 24

dirkjbosman/ml-inference-benchmarks

Compare inference performance of ML models across C++ vs Python

Language: C++ - Size: 14.6 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

chengzeyi/ParaAttention

https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching

Language: Python - Size: 13.4 MB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 296 - Forks: 29

friendliai/friendli-client

Friendli: the fastest serving engine for generative AI

Language: Python - Size: 4.88 MB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 47 - Forks: 7

CoderLSF/fast-llama

Runs LLaMA with Extremely HIGH speed

Language: C++ - Size: 252 KB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 90 - Forks: 10

Tencent/Forward

A library for high performance deep learning inference on NVIDIA GPUs.

Language: C++ - Size: 81.1 MB - Last synced at: 9 days ago - Pushed at: over 3 years ago - Stars: 553 - Forks: 67

andrewkchan/yalm

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

Language: C++ - Size: 347 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 367 - Forks: 35

siliconflow/onediff

OneDiff: An out-of-the-box acceleration library for diffusion models.

Language: Jupyter Notebook - Size: 114 MB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 1,895 - Forks: 124

quic/ai-hub-apps

The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

Language: Java - Size: 27.9 MB - Last synced at: 2 days ago - Pushed at: 12 days ago - Stars: 211 - Forks: 50

pylint-dev/astroid

A common base representation of python source code for pylint and other projects

Language: Python - Size: 16.6 MB - Last synced at: 10 days ago - Pushed at: 11 days ago - Stars: 546 - Forks: 294

viam-modules/viam-mlmodelservice-triton

MLModelService wrapping Nvidia's Triton Server

Language: C++ - Size: 168 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 5 - Forks: 6

curtisgray/wingman

Wingman is the fastest and easiest way to run Llama models on your PC or Mac.

Language: TypeScript - Size: 188 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 41 - Forks: 2

kyegomez/Exa

Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and minimal learning curve.

Language: Python - Size: 2.44 MB - Last synced at: 5 days ago - Pushed at: 7 months ago - Stars: 26 - Forks: 4

DarkStarStrix/Lambda_Inference

A inference application to serve Scientific Models

Language: Python - Size: 38.4 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 7 - Forks: 0

sorohere/Hand-Pose-Detection

This project offers a versatile platform for hand-related tasks, including dataset generation and custom hand gesture detection using Google's MediaPipe library and accelerated real-time sign language translation with LLMs on edge devices.

Language: Python - Size: 821 MB - Last synced at: 5 days ago - Pushed at: 12 months ago - Stars: 14 - Forks: 2

sp-muramutsa/minesweeper

A classic Minesweeper game in Python with Pygame, featuring an AI agent that uses knowledge representation and first-order logic to make strategic, informed moves.

Language: Python - Size: 146 KB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

TheBlackPlague/MantaRay

Lightspeed C++ Neural Network (UE) Inference Library for Chess

Language: C++ - Size: 4.13 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 7 - Forks: 0

zjhellofss/KuiperInfer

校招、秋招、春招、实习好项目！带你从零实现一个高性能的深度学习推理库，支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step

Language: C++ - Size: 310 MB - Last synced at: 21 days ago - Pushed at: 8 months ago - Stars: 2,945 - Forks: 333

zhihu/ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

Language: C++ - Size: 996 KB - Last synced at: 20 days ago - Pushed at: about 1 month ago - Stars: 890 - Forks: 104

matteocarnelos/microflow-rs

A robust and efficient TinyML inference engine.

Language: Rust - Size: 1.29 MB - Last synced at: 17 days ago - Pushed at: 5 months ago - Stars: 130 - Forks: 14

PaddlePaddle/Paddle.js

Paddle.js is a web project for Baidu PaddlePaddle, which is an open source deep learning framework running in the browser. Paddle.js can either load a pre-trained model, or transforming a model from paddle-hub with model transforming tools provided by Paddle.js. It could run in every browser with WebGL/WebGPU/WebAssembly supported. It could also run in Baidu Smartprogram and WX miniprogram.

Language: JavaScript - Size: 90.7 MB - Last synced at: 20 days ago - Pushed at: about 1 year ago - Stars: 1,037 - Forks: 144

Torsion-Audio/nn-inference-template

Neural network inference template for real-time cricital audio environments - presented at ADC23

Language: C++ - Size: 18.6 MB - Last synced at: 22 days ago - Pushed at: 9 months ago - Stars: 110 - Forks: 5

zjhellofss/KuiperLLama

校招、秋招、春招、实习好项目，带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。

Language: C++ - Size: 2.27 MB - Last synced at: 26 days ago - Pushed at: 3 months ago - Stars: 346 - Forks: 88

Tencent/FeatherCNN

FeatherCNN is a high performance inference engine for convolutional neural networks.

Language: C++ - Size: 40.8 MB - Last synced at: 25 days ago - Pushed at: over 5 years ago - Stars: 1,219 - Forks: 282

Adlik/Adlik

Adlik: Toolkit for Accelerating Deep Learning Inference

Language: C++ - Size: 52.9 MB - Last synced at: 21 days ago - Pushed at: over 1 year ago - Stars: 799 - Forks: 82

openvinotoolkit/openvino_contrib

Repository for OpenVINO's extra modules

Language: C++ - Size: 44 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 121 - Forks: 155

opencog/ure

[NO LONGER MAINTAINED, SUPERSEDED BY https://github.com/trueagi-io/chaining]. Unified Rule Engine. Graph rewriting system for the AtomSpace. Used as reasoning engine for OpenCog.

Language: C++ - Size: 113 MB - Last synced at: 23 days ago - Pushed at: 3 months ago - Stars: 55 - Forks: 31

nilp0inter/experta Fork of buguroo/pyknow

Expert Systems for Python

Language: Python - Size: 1.77 MB - Last synced at: 8 days ago - Pushed at: 4 months ago - Stars: 167 - Forks: 44

priyam-hub/LLM-Fine-Tuning-Pipeline

A comprehensive pipeline for Different Fine-Tuning Methods for Large Language Models with optimized performance and resource efficiency. This pipeline handles the entire workflow from data preparation to model evaluation, making advanced LLM customization accessible and efficient.

Language: Python - Size: 17.2 MB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

ulfurinn/wongi-engine-elixir

A rule engine written in Elixir.

Language: Elixir - Size: 146 KB - Last synced at: 13 days ago - Pushed at: 22 days ago - Stars: 23 - Forks: 2

ulfurinn/wongi-engine

A rule engine written in Ruby.

Language: Ruby - Size: 997 KB - Last synced at: 26 days ago - Pushed at: about 1 year ago - Stars: 485 - Forks: 39

InfiniTensor/RefactorGraph

分层解耦的深度学习推理引擎

Language: C++ - Size: 2.2 MB - Last synced at: 29 days ago - Pushed at: 4 months ago - Stars: 73 - Forks: 14

XUANTIE-RV/csi-nn2

An optimized neural network operator library for chips base on Xuantie CPU.

Language: C - Size: 16.1 MB - Last synced at: 2 days ago - Pushed at: 12 months ago - Stars: 89 - Forks: 40

sdcondon/SCFirstOrderLogic

Simple first-order logic implementation for .NET.

Language: C# - Size: 5.22 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 5 - Forks: 0

acrion/zelph

A sophisticated semantic network system capable of encoding inference rules within the network itself. Built for efficient memory usage and powerful logical reasoning, zelph can process the entire Wikidata knowledge graph (1.4TB) to detect contradictions and make logical deductions.

Language: C - Size: 463 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 0

dieharders/obrew-studio-server

Obrew Studio - Server: A self-hostable machine learning engine. Build agents and schedule workflows private to you.

Language: Python - Size: 138 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 9 - Forks: 1

Send37/NFAI

.NET native and Vulkan inference engine

Language: C# - Size: 81.1 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

AbstractionsLab/satrap-dl

SATRAP-DL (Semi-Automated Threat Reconnaissance and Analysis Powered by Description Logics) aims at the development of a platform for interactive computer-aided analysis of cyber threat intelligence driven by logic-based automated reasoning and inference.

Language: Python - Size: 5.01 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

m0dulo/InferSpore

🌱 A fully independent Large Language Model (LLM) inference engine, built leveraging cuBLAS and cub. 🧩

Language: Cuda - Size: 110 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 17 - Forks: 1

nrl-ai/daisykit

DaisyKit is an easy AI toolkit with face mask detection, pose detection, background matting, barcode detection, face recognition and more. - with NCNN, OpenCV, Python wrappers

Language: C++ - Size: 252 MB - Last synced at: 23 days ago - Pushed at: over 1 year ago - Stars: 106 - Forks: 20

RubixML/Server

A standalone inference server for trained Rubix ML estimators.

Language: PHP - Size: 18.3 MB - Last synced at: 24 days ago - Pushed at: 3 months ago - Stars: 62 - Forks: 11

HoloClean/holoclean

A Machine Learning System for Data Enrichment.

Language: Python - Size: 8.09 MB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 520 - Forks: 128

datagram-db/knobab

Fast LTLf Log-SAT Solver with Data Payload!

Language: C++ - Size: 204 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

yas-sim/interactive-image-inpainting

Deep Learning Based Interactive Image Inpainting Demo

Language: Python - Size: 874 KB - Last synced at: 18 days ago - Pushed at: about 4 years ago - Stars: 33 - Forks: 5

nk-kotsomitis/Ingenuity

Ingenuity is an optimized inference engine and benchmarking tool for TinyML models on embedded IoT devices.

Language: Python - Size: 2.87 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

Kernel-Dirichlet/fastinference

Fast ML inference & cross-platform Rust library

Language: Rust - Size: 39.1 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 1

gottingen/kumo-search

docs for search system and ai infra

Size: 177 MB - Last synced at: 2 months ago - Pushed at: 11 months ago - Stars: 218 - Forks: 22

cschen1205/java-expert-system-shell

Expert System Shell implemented in Java

Language: Java - Size: 141 KB - Last synced at: 11 days ago - Pushed at: about 8 years ago - Stars: 17 - Forks: 5

tinyllm/tinylm

Browser based ML Inference | OpenAI compliant | Run models like DeepSeek, Llama 3.2, NomicEmbed, KokoroTTS, and more

Language: TypeScript - Size: 4.2 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 33 - Forks: 2

metrumresearchgroup/Torsten

library of C++ functions that support applications of Stan in Pharmacometrics

Language: C++ - Size: 716 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 56 - Forks: 12

danyvarghese/PyGol

A novel Inductive Logic Programming(ILP) system based on Meta Inverse Entailment in Python.

Language: C - Size: 6.9 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 13 - Forks: 3

yas-sim/person-detect-reidentification

Person or face detection and matching from multiple image inputs using Intel OpenVINO toolkit

Language: Python - Size: 33.3 MB - Last synced at: about 2 months ago - Pushed at: about 5 years ago - Stars: 36 - Forks: 8

EugenHotaj/zig_gpt2

GPT-2 inference engine written in Zig

Language: Zig - Size: 17.2 MB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 37 - Forks: 5

msnh2012/Msnhnet

🔥 (yolov3 yolov4 yolov5 unet ...)A mini pytorch inference framework which inspired from darknet.

Language: C++ - Size: 14.7 MB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 744 - Forks: 148

jerinphilip/slimt

Inference slice of marian for bergamot's tiny11 models. Faster to compile, and wield. Fewer model-archs than bergamot-translator.

Language: C++ - Size: 387 KB - Last synced at: 3 days ago - Pushed at: 8 months ago - Stars: 11 - Forks: 2

zpye/SimpleInfer

A simple neural network inference framework

Language: C++ - Size: 2.15 MB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 24 - Forks: 2

BMW-InnovationLab/BMW-TensorFlow-Inference-API-CPU

This is a repository for an object detection inference API using the Tensorflow framework.

Language: Python - Size: 9.96 MB - Last synced at: 2 months ago - Pushed at: almost 3 years ago - Stars: 183 - Forks: 47

yas-sim/gaze-estimation-with-laser-sparking

Deep learning based gaze estimation demo with a fun feature :-)

Language: Python - Size: 4.4 MB - Last synced at: about 2 months ago - Pushed at: about 4 years ago - Stars: 40 - Forks: 8

jithin8mathew/yolomosaic

A Python library for visualizing YOLO detections and segmented instances on large orthomosaic images, with the ability to generate shapefiles for GIS integration

Language: Python - Size: 45.4 MB - Last synced at: 17 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0