ARTFEED — Contemporary Art Intelligence

New Benchmark MTR-DuplexBench Evaluates Multi-Round Full-Duplex Speech Language Models

ai-technology · 2026-04-20

A new evaluation standard, MTR-DuplexBench, has been launched to fill the gaps in assessing Full-Duplex Speech Language Models (FD-SLMs). These models facilitate simultaneous, real-time conversations, enhancing user engagement compared to conventional half-duplex systems. Current benchmarks mainly concentrate on single-turn interactions, failing to capture the intricacies of multi-turn dialogue. Evaluating FD-SLMs presents challenges such as indistinct turn boundaries and inconsistent context during inference. MTR-DuplexBench breaks down continuous full-duplex conversations into distinct turns for detailed assessment. Additionally, it includes a range of evaluation criteria beyond mere conversational elements. This benchmark seeks to establish a thorough multi-turn evaluation framework for FD-SLMs, addressing the shortcomings of existing benchmarks that often miss vital components of multi-turn communication.

Key facts

  • MTR-DuplexBench is a novel benchmark for evaluating Full-Duplex Speech Language Models
  • FD-SLMs enable real-time, overlapping conversational interactions
  • Existing benchmarks primarily focus on single-round interactions
  • Evaluating FD-SLMs in multi-round settings poses challenges like blurred turn boundaries
  • Context inconsistency during model inference is another evaluation challenge
  • The benchmark segments continuous full-duplex dialogues into discrete turns
  • It incorporates various evaluation aspects beyond conversational features
  • It addresses gaps in current FD-SLM evaluation methods

Entities

Sources