fast-inference | Topic | Ecosyste.ms: Repos

Topic: "fast-inference"

foolwood/pytorch-slimming

Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.

Language: Python - Size: 12.7 KB - Last synced at: 24 days ago - Pushed at: about 6 years ago - Stars: 573 - Forks: 96

aredden/flux-fp8-api

Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.

Language: Python - Size: 157 KB - Last synced at: 5 months ago - Pushed at: 8 months ago - Stars: 227 - Forks: 28

kssteven418/BigLittleDecoder

[NeurIPS'23] Speculative Decoding with Big Little Decoder

Language: Python - Size: 100 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 90 - Forks: 10

dvlab-research/Q-LLM

This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"

Language: Python - Size: 6.84 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 29 - Forks: 0

Academich/translation-transformer

An implementation of the encoder-decoder transformer for SMILES-to-SMILES translation tasks with inference accelerated by speculative decoding

Language: Python - Size: 1.58 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 5 - Forks: 0

szemenyeim/RoboDNN

Fast Forward-Only Deep Neural Network Library for the Nao Robots

Language: C++ - Size: 538 KB - Last synced at: 12 days ago - Pushed at: about 6 years ago - Stars: 5 - Forks: 1

MeoPBK/Fast_Inference_Classifiers

Multilable fast inference classifiers (Ridge Regression and MLP) for NLPs with Sentence Embedder, K-Fold, Bootstrap and Boosting. NOTE: since the MLP (fully connected NN) Classifier was too heavy to be loaded, you can just compile it with the script.

Language: Python - Size: 79.8 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

u-hyszk/japanese-speculative-decoding

Verification of the effect of speculative decoding in Japanese.

Language: Python - Size: 245 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

lim142857/Sparsifiner

Official Codebase for CVPR2023 paper "Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers"

Language: Python - Size: 46.9 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

PopoDev/CSE481N_Project

Reproducibility Project for [NeurIPS'23] Speculative Decoding with Big Little Decoder

Language: Python - Size: 6.33 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos