ARTFEED — Contemporary Art Intelligence

DOA: Training-Free Decoder-Only Attention Policy for Long-Form Simultaneous Translation with SpeechLLMs

ai-technology · 2026-06-01

A new training-free policy called Decoder-Only Attention (DOA) enables long-form simultaneous speech-to-text translation using off-the-shelf Speech Large Language Models (SpeechLLMs). Current simultaneous translation systems rely on attention-based encoder-decoder models with cross-attention for alignment, but SpeechLLMs are decoder-only and use self-attention. DOA derives a proxy alignment from self-attention, allowing streaming decisions without additional training. The approach addresses the lack of validation in long-form settings and avoids heuristic wait-k policies. The paper is available on arXiv under reference 2605.31432.

Key facts

  • DOA is a training-free policy for simultaneous translation.
  • It uses decoder-only SpeechLLMs without cross-attention.
  • The policy derives alignment from self-attention.
  • It enables long-form simultaneous translation.
  • Current methods rely on encoder-decoder models or heuristic wait-k.
  • The approach is validated on off-the-shelf models.
  • The paper is on arXiv (2605.31432).
  • It addresses the gap in long-form settings.

Entities

Institutions

  • arXiv

Sources