ARTFEED — Contemporary Art Intelligence

D-PACE: Dynamic Loss for Parallel Speculative Decoding in LLMs

ai-technology · 2026-05-20

A novel training approach known as D-PACE (Dynamic Position-Aware Cross-Entropy) enhances speculative decoding in large language models (LLMs). This technique speeds up inference by employing a smaller drafter model to suggest tokens, which a larger target model then verifies simultaneously. While recent diffusion-based drafters like DFlash can predict entire token blocks in a single forward pass, current multi-token objectives utilize static position-dependent weights that remain unchanged during training. D-PACE generates weights for each position based on a differentiable approximation of expected accepted draft length, directing the training focus towards positions that hinder acceptance. Testing with Qwen3-4B drafter models across six benchmarks revealed improved acceptance rates and faster inference. The paper can be found on arXiv under identifier 2605.18810.

Key facts

  • D-PACE is a dynamic position-aware cross-entropy loss for speculative decoding.
  • It addresses fixed weighting schedules in multi-token drafter objectives.
  • Weights are derived from a differentiable surrogate of expected accepted draft length.
  • Tested across six benchmarks with Qwen3-4B drafter models.
  • Published on arXiv with ID 2605.18810.
  • Related to diffusion-based parallel drafters like DFlash.
  • Aims to accelerate LLM inference by improving token acceptance rates.
  • Training signal shifts toward positions limiting acceptance as drafter improves.

Entities

Institutions

  • arXiv

Sources