OpenSTBench: Unified Evaluation Framework for Speech Translation

ai-technology · 2026-06-01

The newly introduced evaluation framework, OpenSTBench, tackles the issue of comparing diverse speech translation systems. This field encompasses various modes, including speech-to-text (S2TT) and speech-to-speech (S2ST), along with offline and streaming options, each yielding distinct outputs in terms of modality, timing, and quality of speech. Current evaluation techniques analyze translation quality, speech quality, and temporal quality in isolation, complicating comprehensive comparisons. OpenSTBench consolidates these elements into a unified format, accommodating both S2TT and S2ST across offline and streaming contexts. It evaluates translation quality, speech quality, speaker preservation, emotional and paralinguistic fidelity, temporal consistency, and latency collectively. The framework is detailed in a paper available on arXiv.

Key facts

OpenSTBench is a unified multidimensional evaluation framework for speech translation.
It supports speech-to-text translation (S2TT) and speech-to-speech translation (S2ST).
It covers offline and streaming generation settings.
It jointly evaluates translation quality, speech quality, speaker preservation, emotion and paralinguistic fidelity, temporal consistency, and latency.
Existing evaluation practices assess these aspects under separate protocols.
The framework aims to enable comprehensive comparison of heterogeneous systems.
The paper is available on arXiv with ID 2605.30792.
The announcement type is cross.

OpenSTBench: Unified Evaluation Framework for Speech Translation

Key facts

Entities

Institutions

Sources