EvoSpec: Real-Time Vocabulary Adaptation for Speculative Decoding

ai-technology · 2026-05-28

EvoSpec is a new framework that enables real-time evolution of draft models in speculative decoding for Large Language Models. It addresses the bottleneck of output projection layers as vocabulary sizes scale, which static pruning methods fail to handle in specialized domains or topic-switching scenarios. EvoSpec uses a context-aware mechanism to retrieve critical long-tail tokens via efficient semantic and statistical indexing, and employs a lightweight online alignment strategy with curriculum learning to minimize the distributional gap between draft and target models. The paper is published on arXiv under ID 2605.27390.

Key facts

EvoSpec enables real-time evolution of draft models through dynamic vocabulary and parameter adaptation.
It addresses the bottleneck of output projection layers in speculative decoding as vocabulary sizes scale.
Static pruning methods suffer from drops in acceptance rate in specialized domains or topic-switching scenarios.
EvoSpec uses a context-aware mechanism to retrieve critical long-tail tokens via semantic and statistical indexing.
It employs a lightweight online alignment strategy utilizing curriculum learning.
The goal is to minimize the distributional gap between draft and target models.
The paper is published on arXiv with ID 2605.27390.
The announcement type is cross.

EvoSpec: Real-Time Vocabulary Adaptation for Speculative Decoding

Key facts

Entities

Institutions

Sources