SQuTR Benchmark Tests Spoken Query Retrieval Under Noise

other · 2026-05-07

Researchers have introduced SQuTR, a robustness benchmark for spoken query to text retrieval systems. The benchmark includes a large-scale dataset of 37,317 unique queries from six English and Chinese text retrieval datasets, covering multiple domains. Speech is synthesized from 200 real speakers, and 17 categories of real-world environmental noise are mixed under controlled SNR levels. This allows reproducible evaluation from quiet to highly noisy conditions. The work aims to address the limitations of existing evaluation datasets, which are often restricted to simple queries under constrained noise conditions.

Key facts

SQuTR is a robustness benchmark for spoken query retrieval.
The dataset aggregates 37,317 unique queries.
Queries come from six English and Chinese text retrieval datasets.
Speech is synthesized using voice profiles from 200 real speakers.
17 categories of real-world environmental noise are used.
Noise is mixed under controlled SNR levels.
The benchmark enables reproducible robustness evaluation.
Existing datasets are limited to simple queries under constrained noise.

Entities

—

Sources

arXiv cs.AI — 2026-05-07