Game-Time Benchmark Tests Temporal Skills of Spoken Language Models
A new benchmark called Game-Time evaluates temporal dynamics in conversational Spoken Language Models (SLMs), including timing, tempo, and simultaneous speaking. Inspired by human language learning through activities, it includes basic instruction-following tasks and advanced tasks with temporal constraints like tempo adherence and synchronized responses. Evaluation of diverse SLM architectures shows a clear performance disparity: state-of-the-art models handle basic tasks well, but many contemporary systems struggle with fundamental instruction-following. Nearly all models degrade substantially under temporal constraints, highlighting a critical gap in conversational fluency. The research is published on arXiv with ID 2509.26388.
Key facts
- Game-Time Benchmark assesses temporal dynamics in SLMs
- Tasks include basic instruction-following and advanced temporal constraints
- State-of-the-art models perform well on basic tasks
- Many contemporary systems struggle with fundamental instruction-following
- Nearly all models degrade under temporal constraints
- Research published on arXiv with ID 2509.26388
- Inspired by human language learning through activities
Entities
Institutions
- arXiv