Game-Time Benchmark Tests Temporal Skills of Spoken Language Models

ai-technology · 2026-05-04

A new benchmark called Game-Time evaluates temporal dynamics in conversational Spoken Language Models (SLMs), including timing, tempo, and simultaneous speaking. Inspired by human language learning through activities, it includes basic instruction-following tasks and advanced tasks with temporal constraints like tempo adherence and synchronized responses. Evaluation of diverse SLM architectures shows a clear performance disparity: state-of-the-art models handle basic tasks well, but many contemporary systems struggle with fundamental instruction-following. Nearly all models degrade substantially under temporal constraints, highlighting a critical gap in conversational fluency. The research is published on arXiv with ID 2509.26388.

Key facts

Game-Time Benchmark assesses temporal dynamics in SLMs
Tasks include basic instruction-following and advanced temporal constraints
State-of-the-art models perform well on basic tasks
Many contemporary systems struggle with fundamental instruction-following
Nearly all models degrade under temporal constraints
Research published on arXiv with ID 2509.26388
Inspired by human language learning through activities

Game-Time Benchmark Tests Temporal Skills of Spoken Language Models

Key facts

Entities

Institutions

Sources