GitHub topics: speculative-decoding

Repositories

Infini-AI-Lab/TriForce

[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Language: Python - Size: 71.7 MB - Last synced at: 1 day ago - Pushed at: 9 months ago - Stars: 250 - Forks: 17

facebookresearch/LayerSkip

Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024

Language: Python - Size: 11.3 MB - Last synced at: about 22 hours ago - Pushed at: 15 days ago - Stars: 297 - Forks: 25

aphrodite-engine/aphrodite-engine

Large-scale LLM inference engine

Language: C++ - Size: 36 MB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 1,419 - Forks: 152

ccs96307/fast-llm-inference

Accelerating LLM inference with techniques like speculative decoding, quantization, and kernel fusion, focusing on implementing state-of-the-art research papers.

Language: Jupyter Notebook - Size: 168 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 8 - Forks: 1

SafeAILab/EAGLE

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.

Language: Python - Size: 68.6 MB - Last synced at: 5 days ago - Pushed at: 12 days ago - Stars: 1,227 - Forks: 135

hsj576/GRIFFIN

Official Implementation of "GRIFFIN: Effective Token Alignment for Faster Speculative Decoding"

Language: Python - Size: 11.9 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 6 - Forks: 0

FasterDecoding/REST

REST: Retrieval-Based Speculative Decoding, NAACL 2024

Language: C - Size: 1.06 MB - Last synced at: 1 day ago - Pushed at: 6 months ago - Stars: 201 - Forks: 12

Geralt-Targaryen/Awesome-Speculative-Decoding

Reading notes on Speculative Decoding papers

Size: 4.64 MB - Last synced at: 8 days ago - Pushed at: 29 days ago - Stars: 4 - Forks: 0

bigai-nlco/TokenSwift

From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation

Language: Python - Size: 61.6 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 89 - Forks: 8

BaohaoLiao/RSD

[ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.

Language: Python - Size: 10.9 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 26 - Forks: 3

intel/intel-extension-for-transformers 📦

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Language: Python - Size: 585 MB - Last synced at: 18 days ago - Pushed at: 7 months ago - Stars: 2,169 - Forks: 210

mscheong01/speculative_decoding.c

minimal C implementation of speculative decoding based on llama2.c

Language: C - Size: 2.06 MB - Last synced at: 7 days ago - Pushed at: 10 months ago - Stars: 22 - Forks: 2

AutonomicPerfectionist/PipeInfer

PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation

Language: C++ - Size: 17.5 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 29 - Forks: 4

kssteven418/BigLittleDecoder

[NeurIPS'23] Speculative Decoding with Big Little Decoder

Language: Python - Size: 100 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 90 - Forks: 10

Infini-AI-Lab/UMbreLLa

LLM Inference on consumer devices

Language: Python - Size: 28.7 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 95 - Forks: 14

natask/infra_gpu_hack

A novel algorithm that integrates a text, diffusion LLM as a draft model to boost the performance of traditional auto-regressive LLMs.

Language: Python - Size: 494 KB - Last synced at: 3 days ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

hemingkx/SWIFT

[ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration

Language: Python - Size: 1.18 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 38 - Forks: 1

wtlow003/speculative-sampling

Implementation of Speculative Sampling in "Accelerating Large Language Model Decoding with Speculative Sampling"

Language: Python - Size: 30.3 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 5 - Forks: 1

Infini-AI-Lab/Sequoia

scalable and robust tree-based speculative decoding algorithm

Language: Python - Size: 4.6 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 331 - Forks: 37

smpanaro/token-recycling

Unofficial implementation of Token Recycling self-speculative decoding method.

Language: Python - Size: 611 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

wtlow003/ngram-decoding

(Re)-implementation of "Prompt Lookup Decoding" by Apoorv Saxena, with extended ideas from LLMA Decoding.

Language: Jupyter Notebook - Size: 552 KB - Last synced at: 2 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

PopoDev/CSE481N_Project

Reproducibility Project for [NeurIPS'23] Speculative Decoding with Big Little Decoder

Language: Python - Size: 6.33 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

pinqian77/Dynasurge

Dynasurge: Dynamic Tree Speculation for Prompt-Specific Decoding

Language: Python - Size: 4.71 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

u-hyszk/japanese-speculative-decoding

Verification of the effect of speculative decoding in Japanese.

Language: Python - Size: 245 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

hemingkx/SpecDec

Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)

Language: Python - Size: 7.22 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 16 - Forks: 0

Related Keywords

speculative-decoding 25 llm-inference 10 llm 8 large-language-models 6 inference 4 efficiency 3 fast-inference 3 retrieval 2 nlp 2 large-language-model 2 acceleration 2 intel-optimized-llamacpp 1 habana 1 llm-cpu 1 neural-chat 1 neural-chat-7b 1 rag 1 gaudi3 1 streamingllm 1 artificial-intelligence 1 chatpdf 1 c 1 long-context 1 llama2 1 llamacpp 1 decoding 1 efficient-inference 1 speculative-execution 1 offloading 1 diffusion-models 1 deepmind 1 speculative-sampling 1 n-gram 1 ngram-decoding 1 prompt-lookup-decoding 1 japanese 1 non-autoregressive 1 early-exit 1 layer-drop 1 optimization 1 transformers 1 api-rest 1 cuda 1 inference-engine 1 inferentia 1 intel 1 lora 1 machine-learning 1 rocm 1 tpu 1 inference-optimization 1 awesome 1 papers 1 deepseek 1 llm-serving 1 llms 1 qwen 1 transformer 1 decoding-algorithm 1 process-reward-model 1 reasoning 1 4-bits 1 autoround 1 chatbot 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos