New Benchmark MTR-DuplexBench Evaluates Multi-Round Full-Duplex Speech Language Models

ai-technology · 2026-04-20

A new evaluation standard, MTR-DuplexBench, has been launched to fill the gaps in assessing Full-Duplex Speech Language Models (FD-SLMs). These models facilitate simultaneous, real-time conversations, enhancing user engagement compared to conventional half-duplex systems. Current benchmarks mainly concentrate on single-turn interactions, failing to capture the intricacies of multi-turn dialogue. Evaluating FD-SLMs presents challenges such as indistinct turn boundaries and inconsistent context during inference. MTR-DuplexBench breaks down continuous full-duplex conversations into distinct turns for detailed assessment. Additionally, it includes a range of evaluation criteria beyond mere conversational elements. This benchmark seeks to establish a thorough multi-turn evaluation framework for FD-SLMs, addressing the shortcomings of existing benchmarks that often miss vital components of multi-turn communication.

Key facts

MTR-DuplexBench is a novel benchmark for evaluating Full-Duplex Speech Language Models
FD-SLMs enable real-time, overlapping conversational interactions
Existing benchmarks primarily focus on single-round interactions
Evaluating FD-SLMs in multi-round settings poses challenges like blurred turn boundaries
Context inconsistency during model inference is another evaluation challenge
The benchmark segments continuous full-duplex dialogues into discrete turns
It incorporates various evaluation aspects beyond conversational features
It addresses gaps in current FD-SLM evaluation methods

Entities

—

Sources

arXiv cs.AI — 2026-04-20