An open API service providing repository metadata for many open source software ecosystems.

Topic: "efficient-inference"

huawei-noah/Efficient-AI-Backbones

Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.

Language: Python - Size: 98.4 MB - Last synced at: 12 days ago - Pushed at: 3 months ago - Stars: 4,240 - Forks: 723

SqueezeAILab/LLMCompiler

[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling

Language: Python - Size: 375 KB - Last synced at: 1 day ago - Pushed at: 12 months ago - Stars: 1,701 - Forks: 124

snap-research/EfficientFormer

EfficientFormerV2 [ICCV 2023] & EfficientFormer [NeurIPs 2022]

Language: Python - Size: 2.27 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 1,047 - Forks: 92

huawei-noah/AdderNet

Code for paper " AdderNet: Do We Really Need Multiplications in Deep Learning?"

Language: Python - Size: 1.32 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 961 - Forks: 185

horseee/DeepCache

[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free

Language: Python - Size: 102 MB - Last synced at: 27 days ago - Pushed at: 12 months ago - Stars: 893 - Forks: 43

VITA-Group/LightGaussian

[NeurIPS 2024 Spotlight]"LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS", Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang

Language: Python - Size: 445 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 697 - Forks: 66

SqueezeAILab/SqueezeLLM

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

Language: Python - Size: 1.5 MB - Last synced at: 1 day ago - Pushed at: 11 months ago - Stars: 692 - Forks: 46

liuzhuang13/slimming

Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.

Language: Lua - Size: 42 KB - Last synced at: about 1 month ago - Pushed at: almost 6 years ago - Stars: 568 - Forks: 73

Zhen-Dong/Awesome-Quantization-Papers

List of papers related to neural network quantization in recent AI conferences and journals.

Size: 309 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 478 - Forks: 39

SqueezeAILab/KVQuant

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Language: Python - Size: 19.8 MB - Last synced at: 1 day ago - Pushed at: 11 months ago - Stars: 359 - Forks: 31

lucidrains/speculative-decoding

Explorations into some recent techniques surrounding speculative decoding

Language: Python - Size: 34.2 MB - Last synced at: 1 day ago - Pushed at: 6 months ago - Stars: 269 - Forks: 20

Picovoice/picollm

On-device LLM Inference Powered by X-Bit Quantization

Language: Python - Size: 98 MB - Last synced at: 1 day ago - Pushed at: 14 days ago - Stars: 250 - Forks: 14

SYSU-SAIL/SMSR

[CVPR 2021] Exploring Sparsity in Image Super-Resolution for Efficient Inference

Language: Python - Size: 7.37 MB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 239 - Forks: 30

changlin31/DS-Net

(CVPR 2021, Oral) Dynamic Slimmable Network

Language: Python - Size: 83 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 225 - Forks: 19

xuyang-liu16/Awesome-Generation-Acceleration

📚 Collection of awesome generation acceleration resources.

Size: 637 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 215 - Forks: 6

liuziwei7/mobile-id

Deep Face Model Compression

Language: Matlab - Size: 3.62 MB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 195 - Forks: 102

cure-lab/DeciWatch

[ECCV 2022] Official implementation of the paper "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"

Language: Python - Size: 28.2 MB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 177 - Forks: 16

xindongzhang/ELAN

[ECCV2022] Efficient Long-Range Attention Network for Image Super-resolution

Language: Python - Size: 23.4 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 171 - Forks: 15

czg1225/AsyncDiff

Official implementation of "AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising"

Language: Python - Size: 64.7 MB - Last synced at: 10 months ago - Pushed at: 11 months ago - Stars: 130 - Forks: 6

SimonAytes/SoT

Official code repository for Sketch-of-Thought (SoT)

Language: Python - Size: 71.3 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 112 - Forks: 21

horseee/learning-to-cache

[NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching

Language: Python - Size: 5.32 MB - Last synced at: 3 months ago - Pushed at: 12 months ago - Stars: 99 - Forks: 3

kssteven418/BigLittleDecoder

[NeurIPS'23] Speculative Decoding with Big Little Decoder

Language: Python - Size: 100 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 90 - Forks: 10

snap-research/graphless-neural-networks

[ICLR 2022] Code for Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation (GLNN)

Language: Python - Size: 684 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 89 - Forks: 21

RAIVNLab/STR

Soft Threshold Weight Reparameterization for Learnable Sparsity

Language: Python - Size: 63.5 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 84 - Forks: 11

Alpha-Innovator/AdaptiveDiffusion

[NeurIPS'24] Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy

Language: Python - Size: 8.63 MB - Last synced at: 1 day ago - Pushed at: 5 months ago - Stars: 67 - Forks: 3

IBM/AdaMML 📦

Official implementation of AdaMML. https://arxiv.org/abs/2105.05165.

Language: Python - Size: 113 KB - Last synced at: 17 days ago - Pushed at: about 3 years ago - Stars: 51 - Forks: 9

FranxYao/Partially-Observed-TreeCRFs

Implementation of AAAI 21 paper: Nested Named Entity Recognition with Partially Observed TreeCRFs

Language: Python - Size: 1.67 MB - Last synced at: about 2 months ago - Pushed at: about 4 years ago - Stars: 51 - Forks: 7

raymin0223/fast_robust_early_exit

Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)

Language: Python - Size: 56.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 43 - Forks: 8

qiuk2/AAR

[Official Implementation] Acoustic Autoregressive Modeling 🔥

Language: Python - Size: 342 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 40 - Forks: 3

yikaiw/RS-Nets

[ECCV 2020] Code release for "Resolution Switchable Networks for Runtime Efficient Image Recognition"

Language: Python - Size: 1.78 MB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 39 - Forks: 8

ivclab/agegenderLMTCNN

Jia-Hong Lee, Yi-Ming Chan, Ting-Yen Chen, and Chu-Song Chen, "Joint Estimation of Age and Gender from Unconstrained Face Images using Lightweight Multi-task CNN for Mobile Applications," IEEE International Conference on Multimedia Information Processing and Retrieval, MIPR 2018

Language: Python - Size: 289 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 39 - Forks: 3

tchittesh/lzu

Code for Learning to Zoom and Unzoom (CVPR 2023)

Language: Python - Size: 35.3 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 38 - Forks: 4

bharathsudharsan/TinyML-Benchmark-NNs-on-MCUs

Code for WF-IoT paper 'TinyML Benchmark: Executing Fully Connected Neural Networks on Commodity Microcontrollers'

Language: Python - Size: 11.4 MB - Last synced at: 2 months ago - Pushed at: almost 3 years ago - Stars: 35 - Forks: 11

linksense/EfficientNet.PyTorch

Concise, Modular, Human-friendly PyTorch implementation of EfficientNet with Pre-trained Weights.

Language: Python - Size: 25.4 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 31 - Forks: 5

snu-mllab/GuidedQuant

Official PyTorch implementation of "GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance" (ICML 2025)

Language: Python - Size: 3.38 MB - Last synced at: about 7 hours ago - Pushed at: about 8 hours ago - Stars: 30 - Forks: 0

visresearch/LLaVA-STF

The official implementation of "Learning Compact Vision Tokens for Efficient Large Multimodal Models"

Language: Python - Size: 2.62 MB - Last synced at: 6 days ago - Pushed at: 15 days ago - Stars: 27 - Forks: 2

LiuHengyu321/FlexGS

[CVPR2025] Code Release for "FlexGS: Train Once, Deploy Everywhere with Many-in-One Flexible 3D Gaussian Splatting"

Language: Python - Size: 675 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 25 - Forks: 1

bharathsudharsan/CNN_on_MCU

Code for paper 'Multi-Component Optimization and Efficient Deployment of Neural-Networks on Resource-Constrained IoT Hardware'

Language: Jupyter Notebook - Size: 4.91 MB - Last synced at: 13 days ago - Pushed at: about 3 years ago - Stars: 24 - Forks: 19

VITA-Group/triple-wins

[ICLR 2020] ”Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference“

Language: Python - Size: 13.2 MB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 24 - Forks: 7

Zhen-Dong/CoDeNet

[FPGA'21] CoDeNet is an efficient object detection model on PyTorch, with SOTA performance on VOC and COCO based on CenterNet and Co-Designed deformable convolution.

Language: Python - Size: 6.17 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 21 - Forks: 4

xuyang-liu16/VidCom2

Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models

Language: Python - Size: 5.52 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 20 - Forks: 0

snap-research/linkless-link-prediction

[ICML 2023] Linkless Link Prediction via Relational Distillation

Language: Python - Size: 184 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 20 - Forks: 7

ivclab/NeuralMerger

Yi-Min Chou, Yi-Ming Chan, Jia-Hong Lee, Chih-Yi Chiu, Chu-Song Chen, "Unifying and Merging Well-trained Deep Neural Networks for Inference Stage," International Joint Conference on Artificial Intelligence (IJCAI), 2018

Language: Python - Size: 18.5 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 20 - Forks: 3

IBM/AutoVP

[ICLR24] AutoVP: An Automated Visual Prompting Framework and Benchmark

Language: Python - Size: 577 KB - Last synced at: 17 days ago - Pushed at: about 2 months ago - Stars: 19 - Forks: 2

xternalz/SDPoint

Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks

Language: Python - Size: 10.7 KB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 18 - Forks: 4

FranxYao/RDP

Implementation of ICML 22 Paper: Scaling Structured Inference with Randomization

Language: Jupyter Notebook - Size: 115 MB - Last synced at: about 2 months ago - Pushed at: almost 3 years ago - Stars: 14 - Forks: 3

ivclab/Multistage_Pruning

Cheng-Hao Tu, Jia-Hong Lee, Yi-Ming Chan and Chu-Song Chen, "Pruning Depthwise Separable Convolutions for MobileNet Compression," International Joint Conference on Neural Networks, IJCNN 2020, July 2020.

Language: Python - Size: 33.2 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 13 - Forks: 3

visresearch/SDMPrune

The official implementation of "SDMPrune: Self-Distillation MLP Pruning for Efficient Large Language Models"

Language: Python - Size: 137 KB - Last synced at: 6 days ago - Pushed at: 14 days ago - Stars: 12 - Forks: 0

bharathsudharsan/ML-Classifiers-on-MCUs

Supplementary material for IEEE Services Computing paper 'An SRAM Optimized Approach for Constant Memory Consumption and Ultra-fast Execution of ML Classifiers on TinyML Hardware'

Language: Jupyter Notebook - Size: 584 KB - Last synced at: 2 months ago - Pushed at: almost 4 years ago - Stars: 12 - Forks: 1

changwoolee/BLAST

[NeurIPS 2024] BLAST: Block Level Adaptive Structured Matrix for Efficient Deep Neural Network Inference

Language: Python - Size: 1.43 MB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 10 - Forks: 0

ivclab/Merging-MobileNets-for-Multitask

Cheng-En Wu, Yi-Ming Chan and Chu-Song Chen "On Merging MobileNets for Efficient Multitask Inference", International Symposium on High-Performance Computer Architecture(HPCA) on Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications(EMC2), 2019

Language: Python - Size: 173 KB - Last synced at: almost 2 years ago - Pushed at: about 5 years ago - Stars: 10 - Forks: 0

ltkong218/MDFlow

MDFlow: Unsupervised Optical Flow Learning by Reliable Mutual Knowledge Distillation (TCSVT 2022)

Language: Python - Size: 30.6 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 2

d-becking/efficientCNNs

Finding Storage- and Compute-Efficient Convolutional Neural Networks

Language: Python - Size: 286 MB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 3

d-becking/neurips-2019-micronet-challenge

NeurIPS 2019 MicroNet Challenge

Language: Python - Size: 66 MB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 1

HolmesShuan/PyTorch-MixNet-SS

Extremely light-weight MixNet with Top-1 75.7% and 2.5M params

Language: Python - Size: 8.79 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 6 - Forks: 1

megh1241/blockset

BLOCKSET: Efficient out of core tree ensemble inference

Language: C++ - Size: 117 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 1

ramonVDAKKER/research-copulas

Semiparametric efficient rank-based estimation of copula parameters

Language: MATLAB - Size: 843 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

bharathsudharsan/Edge2Train

Code for IoT paper 'Edge2Train: a framework to train machine learning models (SVMs) on resource-constrained IoT edge devices'

Language: C - Size: 2.92 MB - Last synced at: 4 months ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 1

tilmto/Adjustable-Quantization-MicroNet

[MicroNet Challenge (NeurIPS 2019 )] "Adjustable Quantization: Jointly Learn the Bit-width and Weight in DNN Training" by Yonggan Fu, Ruiyang Zhao, Yue Wang, Chaojian Li, Haoran You, Zhangyang Wang, Yingyan Lin

Language: Python - Size: 536 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 0

renebidart/text-classification-benchmark

Inference speed / accuracy tradeoff on text classification with transformer models such as BERT, RoBERTa, DeBERTa, SqueezeBERT, MobileBERT, Funnel Transformer, etc.

Language: Jupyter Notebook - Size: 1.49 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

bellthomas/VDQN

Exploring Variational Deep Q Networks. A study undertaken for the University of Cambridge's R244 Computer Science Masters Course. Inspired by https://arxiv.org/abs/1711.11225/.

Language: Python - Size: 12.7 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 1

twcmchang/CP-CNN

Channel-Prioritized Convolutional Neural Networks for Sparsity and Multi-fidelity

Language: Python - Size: 113 KB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 2 - Forks: 2

maxwells-daemons/genome

Compute-efficient reinforcement learning with binary neural networks and evolution strategies.

Language: Python - Size: 834 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 1

MatthiasKi/structurednets

Library for Structured Matrices (approximation methods and structured layers for neural networks)

Language: Python - Size: 439 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 1

bharathsudharsan/ECML-Tutorial-ML-Meets-IoT

Repository of the ECML PKDD 2021 tutorial title 'Machine Learning Meets Internet of Things: From Theory to Practice'

Language: Jupyter Notebook - Size: 32.6 MB - Last synced at: 4 months ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 1

saife245/IMAGE-RECOGNATION

Language: Jupyter Notebook - Size: 9.58 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

Rushikesh321/adder

Event-driven tool/library for tailing the Cardano blockchain blockchain, cardano, ouroboros, ouroboros-network, toolbox

Language: Go - Size: 109 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

MhmDSmdi/FunEditor

[ AAAI 2025 ] The official PyTorch implementation for FunEditor: Achieving Complex Image Edits via Function Aggregation with Diffusion Models

Size: 1000 Bytes - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Touka20/DSD-THUEE

labs of Digital System Design course in 23 fall

Language: Jupyter Notebook - Size: 431 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

edward62740/EdgeTPU-MOT

Language: C++ - Size: 659 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

Saeidhoseinipour/LLMnewTopics

Dive into the forefront of Large Language Models (LLMs) with our concise guide on the top 10 hot topics. Explore bias mitigation, efficient training, multimodal models, and more. Stay abreast of the latest advancements shaping the landscape of LLMs.

Size: 5.86 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

DaniAffCH/SegSpaceDetector

Graph Based image processing for segmenting images and detecting free spots in crowded scenes.

Language: C++ - Size: 483 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Related Topics
pytorch 12 quantization 12 model-compression 10 deep-learning 9 imagenet 7 transformer 7 large-language-models 7 llama 7 llm 7 convolutional-neural-networks 6 diffusion-models 6 deep-neural-networks 6 efficient-neural-networks 5 efficient-model 5 tinyml 5 tensorflow 5 optimization 4 sparsity 4 machine-learning 4 edge-computing 4 computer-vision 4 natural-language-processing 4 llms 4 multi-task-learning 3 neural-network 3 network-pruning 3 knowledge-distillation 3 pretrained-models 3 distillation 3 efficientnet 3 arm-cortex-m0 3 ai 3 cmsis-nn 3 tflite 3 llm-inference 3 training-free 3 sparsity-optimization 3 stable-diffusion 3 compression 2 iot-devices 2 arm-cortex-m4 2 graph-neural-networks 2 nlp 2 semantic-segmentation 2 compound-scaling 2 ec2t 2 entropy-coding 2 micronet-challenge 2 neural-architecture-search 2 quantization-algorithms 2 mistral 2 llama2 2 large-language-model 2 scalability 2 c-code-generator 2 localllm 2 zero-shot-learning 2 efficiency 2 image-classification 2 matrix-factorization 2 mobilenet 2 pruning 2 text-to-video 2 text-to-image 2 graph-optimization 2 model-acceleration 2 microcontroller 2 efficient-deep-learning 2 transformers 2 super-resolution 2 large-vision-language-models 2 python 2 text-generation 2 small-models 2 3d-reconstruction 2 gaussian-splatting 2 gnn 2 artificial-intelligence 2 eccv 1 body-reconstruction 1 localllama 1 3d-pose-estimation 1 3d-body-recovery 1 2d-human-pose 1 object-detection 1 fpgas 1 efficient 1 detector 1 deformable-convnets 1 deep-learning-algorithms 1 age-gender-cnn 1 multi-object-tracking 1 android-application 1 edgetpu 1 mobile-application 1 cnn-compression 1 randomized-algorithms 1 multi-modal-learning 1 unifying-and-merging-cnn 1 channel-pruning 1