Topic: "efficient-inference"
huawei-noah/Efficient-AI-Backbones
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
Language: Python - Size: 98.4 MB - Last synced at: 12 days ago - Pushed at: 3 months ago - Stars: 4,240 - Forks: 723

SqueezeAILab/LLMCompiler
[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling
Language: Python - Size: 375 KB - Last synced at: 1 day ago - Pushed at: 12 months ago - Stars: 1,701 - Forks: 124

snap-research/EfficientFormer
EfficientFormerV2 [ICCV 2023] & EfficientFormer [NeurIPs 2022]
Language: Python - Size: 2.27 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 1,047 - Forks: 92

huawei-noah/AdderNet
Code for paper " AdderNet: Do We Really Need Multiplications in Deep Learning?"
Language: Python - Size: 1.32 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 961 - Forks: 185

horseee/DeepCache
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
Language: Python - Size: 102 MB - Last synced at: 27 days ago - Pushed at: 12 months ago - Stars: 893 - Forks: 43

VITA-Group/LightGaussian
[NeurIPS 2024 Spotlight]"LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS", Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang
Language: Python - Size: 445 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 697 - Forks: 66

SqueezeAILab/SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
Language: Python - Size: 1.5 MB - Last synced at: 1 day ago - Pushed at: 11 months ago - Stars: 692 - Forks: 46

liuzhuang13/slimming
Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.
Language: Lua - Size: 42 KB - Last synced at: about 1 month ago - Pushed at: almost 6 years ago - Stars: 568 - Forks: 73

Zhen-Dong/Awesome-Quantization-Papers
List of papers related to neural network quantization in recent AI conferences and journals.
Size: 309 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 478 - Forks: 39

SqueezeAILab/KVQuant
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Language: Python - Size: 19.8 MB - Last synced at: 1 day ago - Pushed at: 11 months ago - Stars: 359 - Forks: 31

lucidrains/speculative-decoding
Explorations into some recent techniques surrounding speculative decoding
Language: Python - Size: 34.2 MB - Last synced at: 1 day ago - Pushed at: 6 months ago - Stars: 269 - Forks: 20

Picovoice/picollm
On-device LLM Inference Powered by X-Bit Quantization
Language: Python - Size: 98 MB - Last synced at: 1 day ago - Pushed at: 14 days ago - Stars: 250 - Forks: 14

SYSU-SAIL/SMSR
[CVPR 2021] Exploring Sparsity in Image Super-Resolution for Efficient Inference
Language: Python - Size: 7.37 MB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 239 - Forks: 30

changlin31/DS-Net
(CVPR 2021, Oral) Dynamic Slimmable Network
Language: Python - Size: 83 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 225 - Forks: 19

xuyang-liu16/Awesome-Generation-Acceleration
📚 Collection of awesome generation acceleration resources.
Size: 637 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 215 - Forks: 6

liuziwei7/mobile-id
Deep Face Model Compression
Language: Matlab - Size: 3.62 MB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 195 - Forks: 102

cure-lab/DeciWatch
[ECCV 2022] Official implementation of the paper "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"
Language: Python - Size: 28.2 MB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 177 - Forks: 16

xindongzhang/ELAN
[ECCV2022] Efficient Long-Range Attention Network for Image Super-resolution
Language: Python - Size: 23.4 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 171 - Forks: 15

czg1225/AsyncDiff
Official implementation of "AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising"
Language: Python - Size: 64.7 MB - Last synced at: 10 months ago - Pushed at: 11 months ago - Stars: 130 - Forks: 6

SimonAytes/SoT
Official code repository for Sketch-of-Thought (SoT)
Language: Python - Size: 71.3 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 112 - Forks: 21

horseee/learning-to-cache
[NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching
Language: Python - Size: 5.32 MB - Last synced at: 3 months ago - Pushed at: 12 months ago - Stars: 99 - Forks: 3

kssteven418/BigLittleDecoder
[NeurIPS'23] Speculative Decoding with Big Little Decoder
Language: Python - Size: 100 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 90 - Forks: 10

snap-research/graphless-neural-networks
[ICLR 2022] Code for Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation (GLNN)
Language: Python - Size: 684 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 89 - Forks: 21

RAIVNLab/STR
Soft Threshold Weight Reparameterization for Learnable Sparsity
Language: Python - Size: 63.5 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 84 - Forks: 11

Alpha-Innovator/AdaptiveDiffusion
[NeurIPS'24] Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
Language: Python - Size: 8.63 MB - Last synced at: 1 day ago - Pushed at: 5 months ago - Stars: 67 - Forks: 3

IBM/AdaMML 📦
Official implementation of AdaMML. https://arxiv.org/abs/2105.05165.
Language: Python - Size: 113 KB - Last synced at: 17 days ago - Pushed at: about 3 years ago - Stars: 51 - Forks: 9

FranxYao/Partially-Observed-TreeCRFs
Implementation of AAAI 21 paper: Nested Named Entity Recognition with Partially Observed TreeCRFs
Language: Python - Size: 1.67 MB - Last synced at: about 2 months ago - Pushed at: about 4 years ago - Stars: 51 - Forks: 7

raymin0223/fast_robust_early_exit
Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)
Language: Python - Size: 56.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 43 - Forks: 8

qiuk2/AAR
[Official Implementation] Acoustic Autoregressive Modeling 🔥
Language: Python - Size: 342 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 40 - Forks: 3

yikaiw/RS-Nets
[ECCV 2020] Code release for "Resolution Switchable Networks for Runtime Efficient Image Recognition"
Language: Python - Size: 1.78 MB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 39 - Forks: 8

ivclab/agegenderLMTCNN
Jia-Hong Lee, Yi-Ming Chan, Ting-Yen Chen, and Chu-Song Chen, "Joint Estimation of Age and Gender from Unconstrained Face Images using Lightweight Multi-task CNN for Mobile Applications," IEEE International Conference on Multimedia Information Processing and Retrieval, MIPR 2018
Language: Python - Size: 289 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 39 - Forks: 3

tchittesh/lzu
Code for Learning to Zoom and Unzoom (CVPR 2023)
Language: Python - Size: 35.3 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 38 - Forks: 4

bharathsudharsan/TinyML-Benchmark-NNs-on-MCUs
Code for WF-IoT paper 'TinyML Benchmark: Executing Fully Connected Neural Networks on Commodity Microcontrollers'
Language: Python - Size: 11.4 MB - Last synced at: 2 months ago - Pushed at: almost 3 years ago - Stars: 35 - Forks: 11

linksense/EfficientNet.PyTorch
Concise, Modular, Human-friendly PyTorch implementation of EfficientNet with Pre-trained Weights.
Language: Python - Size: 25.4 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 31 - Forks: 5

snu-mllab/GuidedQuant
Official PyTorch implementation of "GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance" (ICML 2025)
Language: Python - Size: 3.38 MB - Last synced at: about 7 hours ago - Pushed at: about 8 hours ago - Stars: 30 - Forks: 0

visresearch/LLaVA-STF
The official implementation of "Learning Compact Vision Tokens for Efficient Large Multimodal Models"
Language: Python - Size: 2.62 MB - Last synced at: 6 days ago - Pushed at: 15 days ago - Stars: 27 - Forks: 2

LiuHengyu321/FlexGS
[CVPR2025] Code Release for "FlexGS: Train Once, Deploy Everywhere with Many-in-One Flexible 3D Gaussian Splatting"
Language: Python - Size: 675 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 25 - Forks: 1

bharathsudharsan/CNN_on_MCU
Code for paper 'Multi-Component Optimization and Efficient Deployment of Neural-Networks on Resource-Constrained IoT Hardware'
Language: Jupyter Notebook - Size: 4.91 MB - Last synced at: 13 days ago - Pushed at: about 3 years ago - Stars: 24 - Forks: 19

VITA-Group/triple-wins
[ICLR 2020] ”Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference“
Language: Python - Size: 13.2 MB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 24 - Forks: 7

Zhen-Dong/CoDeNet
[FPGA'21] CoDeNet is an efficient object detection model on PyTorch, with SOTA performance on VOC and COCO based on CenterNet and Co-Designed deformable convolution.
Language: Python - Size: 6.17 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 21 - Forks: 4

xuyang-liu16/VidCom2
Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models
Language: Python - Size: 5.52 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 20 - Forks: 0

snap-research/linkless-link-prediction
[ICML 2023] Linkless Link Prediction via Relational Distillation
Language: Python - Size: 184 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 20 - Forks: 7

ivclab/NeuralMerger
Yi-Min Chou, Yi-Ming Chan, Jia-Hong Lee, Chih-Yi Chiu, Chu-Song Chen, "Unifying and Merging Well-trained Deep Neural Networks for Inference Stage," International Joint Conference on Artificial Intelligence (IJCAI), 2018
Language: Python - Size: 18.5 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 20 - Forks: 3

IBM/AutoVP
[ICLR24] AutoVP: An Automated Visual Prompting Framework and Benchmark
Language: Python - Size: 577 KB - Last synced at: 17 days ago - Pushed at: about 2 months ago - Stars: 19 - Forks: 2

xternalz/SDPoint
Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks
Language: Python - Size: 10.7 KB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 18 - Forks: 4

FranxYao/RDP
Implementation of ICML 22 Paper: Scaling Structured Inference with Randomization
Language: Jupyter Notebook - Size: 115 MB - Last synced at: about 2 months ago - Pushed at: almost 3 years ago - Stars: 14 - Forks: 3

ivclab/Multistage_Pruning
Cheng-Hao Tu, Jia-Hong Lee, Yi-Ming Chan and Chu-Song Chen, "Pruning Depthwise Separable Convolutions for MobileNet Compression," International Joint Conference on Neural Networks, IJCNN 2020, July 2020.
Language: Python - Size: 33.2 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 13 - Forks: 3

visresearch/SDMPrune
The official implementation of "SDMPrune: Self-Distillation MLP Pruning for Efficient Large Language Models"
Language: Python - Size: 137 KB - Last synced at: 6 days ago - Pushed at: 14 days ago - Stars: 12 - Forks: 0

bharathsudharsan/ML-Classifiers-on-MCUs
Supplementary material for IEEE Services Computing paper 'An SRAM Optimized Approach for Constant Memory Consumption and Ultra-fast Execution of ML Classifiers on TinyML Hardware'
Language: Jupyter Notebook - Size: 584 KB - Last synced at: 2 months ago - Pushed at: almost 4 years ago - Stars: 12 - Forks: 1

changwoolee/BLAST
[NeurIPS 2024] BLAST: Block Level Adaptive Structured Matrix for Efficient Deep Neural Network Inference
Language: Python - Size: 1.43 MB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 10 - Forks: 0

ivclab/Merging-MobileNets-for-Multitask
Cheng-En Wu, Yi-Ming Chan and Chu-Song Chen "On Merging MobileNets for Efficient Multitask Inference", International Symposium on High-Performance Computer Architecture(HPCA) on Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications(EMC2), 2019
Language: Python - Size: 173 KB - Last synced at: almost 2 years ago - Pushed at: about 5 years ago - Stars: 10 - Forks: 0

ltkong218/MDFlow
MDFlow: Unsupervised Optical Flow Learning by Reliable Mutual Knowledge Distillation (TCSVT 2022)
Language: Python - Size: 30.6 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 2

d-becking/efficientCNNs
Finding Storage- and Compute-Efficient Convolutional Neural Networks
Language: Python - Size: 286 MB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 3

d-becking/neurips-2019-micronet-challenge
NeurIPS 2019 MicroNet Challenge
Language: Python - Size: 66 MB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 1

HolmesShuan/PyTorch-MixNet-SS
Extremely light-weight MixNet with Top-1 75.7% and 2.5M params
Language: Python - Size: 8.79 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 6 - Forks: 1

megh1241/blockset
BLOCKSET: Efficient out of core tree ensemble inference
Language: C++ - Size: 117 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 1

ramonVDAKKER/research-copulas
Semiparametric efficient rank-based estimation of copula parameters
Language: MATLAB - Size: 843 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

bharathsudharsan/Edge2Train
Code for IoT paper 'Edge2Train: a framework to train machine learning models (SVMs) on resource-constrained IoT edge devices'
Language: C - Size: 2.92 MB - Last synced at: 4 months ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 1

tilmto/Adjustable-Quantization-MicroNet
[MicroNet Challenge (NeurIPS 2019 )] "Adjustable Quantization: Jointly Learn the Bit-width and Weight in DNN Training" by Yonggan Fu, Ruiyang Zhao, Yue Wang, Chaojian Li, Haoran You, Zhangyang Wang, Yingyan Lin
Language: Python - Size: 536 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 0

renebidart/text-classification-benchmark
Inference speed / accuracy tradeoff on text classification with transformer models such as BERT, RoBERTa, DeBERTa, SqueezeBERT, MobileBERT, Funnel Transformer, etc.
Language: Jupyter Notebook - Size: 1.49 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

bellthomas/VDQN
Exploring Variational Deep Q Networks. A study undertaken for the University of Cambridge's R244 Computer Science Masters Course. Inspired by https://arxiv.org/abs/1711.11225/.
Language: Python - Size: 12.7 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 1

twcmchang/CP-CNN
Channel-Prioritized Convolutional Neural Networks for Sparsity and Multi-fidelity
Language: Python - Size: 113 KB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 2 - Forks: 2

maxwells-daemons/genome
Compute-efficient reinforcement learning with binary neural networks and evolution strategies.
Language: Python - Size: 834 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 1

MatthiasKi/structurednets
Library for Structured Matrices (approximation methods and structured layers for neural networks)
Language: Python - Size: 439 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 1

bharathsudharsan/ECML-Tutorial-ML-Meets-IoT
Repository of the ECML PKDD 2021 tutorial title 'Machine Learning Meets Internet of Things: From Theory to Practice'
Language: Jupyter Notebook - Size: 32.6 MB - Last synced at: 4 months ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 1

saife245/IMAGE-RECOGNATION
Language: Jupyter Notebook - Size: 9.58 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

Rushikesh321/adder
Event-driven tool/library for tailing the Cardano blockchain blockchain, cardano, ouroboros, ouroboros-network, toolbox
Language: Go - Size: 109 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

MhmDSmdi/FunEditor
[ AAAI 2025 ] The official PyTorch implementation for FunEditor: Achieving Complex Image Edits via Function Aggregation with Diffusion Models
Size: 1000 Bytes - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Touka20/DSD-THUEE
labs of Digital System Design course in 23 fall
Language: Jupyter Notebook - Size: 431 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

edward62740/EdgeTPU-MOT
Language: C++ - Size: 659 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

Saeidhoseinipour/LLMnewTopics
Dive into the forefront of Large Language Models (LLMs) with our concise guide on the top 10 hot topics. Explore bias mitigation, efficient training, multimodal models, and more. Stay abreast of the latest advancements shaping the landscape of LLMs.
Size: 5.86 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

DaniAffCH/SegSpaceDetector
Graph Based image processing for segmenting images and detecting free spots in crowded scenes.
Language: C++ - Size: 483 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0
