DOA: Training-Free Decoder-Only Attention Policy for Long-Form Simultaneous Translation with SpeechLLMs

ai-technology · 2026-06-01

A new training-free policy called Decoder-Only Attention (DOA) enables long-form simultaneous speech-to-text translation using off-the-shelf Speech Large Language Models (SpeechLLMs). Current simultaneous translation systems rely on attention-based encoder-decoder models with cross-attention for alignment, but SpeechLLMs are decoder-only and use self-attention. DOA derives a proxy alignment from self-attention, allowing streaming decisions without additional training. The approach addresses the lack of validation in long-form settings and avoids heuristic wait-k policies. The paper is available on arXiv under reference 2605.31432.

Key facts

DOA is a training-free policy for simultaneous translation.
It uses decoder-only SpeechLLMs without cross-attention.
The policy derives alignment from self-attention.
It enables long-form simultaneous translation.
Current methods rely on encoder-decoder models or heuristic wait-k.
The approach is validated on off-the-shelf models.
The paper is on arXiv (2605.31432).
It addresses the gap in long-form settings.

DOA: Training-Free Decoder-Only Attention Policy for Long-Form Simultaneous Translation with SpeechLLMs

Key facts

Entities

Institutions

Sources