EchoChain Benchmark Tests AI Voice Assistants on Mid-Speech Interruption Handling
A novel benchmark named EchoChain has been launched to assess the performance of real-time voice assistants when faced with interruptions during their responses. Traditional spoken-dialog benchmarks focus on turn-based interactions and overlook this particular failure aspect. EchoChain facilitates scenario-based dialogues, introducing interruptions at a consistent point in relation to the assistant's speech initiation, enabling fair comparisons among various AI systems. It highlights three prevalent failure patterns in responses following interruptions: contextual inertia, interruption amnesia, and objective displacement. In a paired half-duplex control study, total failures were reduced by 40.2% compared to interrupted sessions, indicating that many mistakes arise from reasoning under interruption rather than just task complexity. None of the tested real-time voice models surpassed a 50% pass rate. The research detailing EchoChain can be found on arXiv with the identifier arXiv:2604.16456v1.
Key facts
- EchoChain is a benchmark for evaluating full-duplex state-update reasoning under mid-speech interruptions.
- Existing spoken-dialog benchmarks largely evaluate turn-based interaction and miss the failure mode of interruptions.
- EchoBench identifies three failure patterns: contextual inertia, interruption amnesia, and objective displacement.
- The benchmark generates scenario-driven conversations and injects interruptions at a standardized point.
- In a paired half-duplex control, total failures dropped by 40.2% relative to interrupted runs.
- No evaluated real-time voice model exceeded a 50% pass rate.
- The research is documented in arXiv:2604.16456v1.
- The announcement type is cross.
Entities
Institutions
- arXiv