D-PACE: Dynamic Loss for Parallel Speculative Decoding in LLMs

ai-technology · 2026-05-20

A novel training approach known as D-PACE (Dynamic Position-Aware Cross-Entropy) enhances speculative decoding in large language models (LLMs). This technique speeds up inference by employing a smaller drafter model to suggest tokens, which a larger target model then verifies simultaneously. While recent diffusion-based drafters like DFlash can predict entire token blocks in a single forward pass, current multi-token objectives utilize static position-dependent weights that remain unchanged during training. D-PACE generates weights for each position based on a differentiable approximation of expected accepted draft length, directing the training focus towards positions that hinder acceptance. Testing with Qwen3-4B drafter models across six benchmarks revealed improved acceptance rates and faster inference. The paper can be found on arXiv under identifier 2605.18810.

Key facts

D-PACE is a dynamic position-aware cross-entropy loss for speculative decoding.
It addresses fixed weighting schedules in multi-token drafter objectives.
Weights are derived from a differentiable surrogate of expected accepted draft length.
Tested across six benchmarks with Qwen3-4B drafter models.
Published on arXiv with ID 2605.18810.
Related to diffusion-based parallel drafters like DFlash.
Aims to accelerate LLM inference by improving token acceptance rates.
Training signal shifts toward positions limiting acceptance as drafter improves.

D-PACE: Dynamic Loss for Parallel Speculative Decoding in LLMs

Key facts

Entities

Institutions

Sources