Topic: "fast-inference"
foolwood/pytorch-slimming
Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.
Language: Python - Size: 12.7 KB - Last synced at: 24 days ago - Pushed at: about 6 years ago - Stars: 573 - Forks: 96

aredden/flux-fp8-api
Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.
Language: Python - Size: 157 KB - Last synced at: 5 months ago - Pushed at: 8 months ago - Stars: 227 - Forks: 28

kssteven418/BigLittleDecoder
[NeurIPS'23] Speculative Decoding with Big Little Decoder
Language: Python - Size: 100 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 90 - Forks: 10

dvlab-research/Q-LLM
This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
Language: Python - Size: 6.84 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 29 - Forks: 0

Academich/translation-transformer
An implementation of the encoder-decoder transformer for SMILES-to-SMILES translation tasks with inference accelerated by speculative decoding
Language: Python - Size: 1.58 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 5 - Forks: 0

szemenyeim/RoboDNN
Fast Forward-Only Deep Neural Network Library for the Nao Robots
Language: C++ - Size: 538 KB - Last synced at: 12 days ago - Pushed at: about 6 years ago - Stars: 5 - Forks: 1

MeoPBK/Fast_Inference_Classifiers
Multilable fast inference classifiers (Ridge Regression and MLP) for NLPs with Sentence Embedder, K-Fold, Bootstrap and Boosting. NOTE: since the MLP (fully connected NN) Classifier was too heavy to be loaded, you can just compile it with the script.
Language: Python - Size: 79.8 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

u-hyszk/japanese-speculative-decoding
Verification of the effect of speculative decoding in Japanese.
Language: Python - Size: 245 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

lim142857/Sparsifiner
Official Codebase for CVPR2023 paper "Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers"
Language: Python - Size: 46.9 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

PopoDev/CSE481N_Project
Reproducibility Project for [NeurIPS'23] Speculative Decoding with Big Little Decoder
Language: Python - Size: 6.33 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0
