AQUA-Bench: Benchmarking Unanswerable Questions in Audio QA

other · 2026-04-30

Researchers have introduced AQUA-Bench, a benchmark designed to evaluate audio question answering systems on unanswerable questions. Existing benchmarks focus on answerable queries, ignoring real-world scenarios where questions are misleading, ill-posed, or incompatible with audio content. AQUA-Bench assesses three scenarios: Absent Answer Detection (correct option missing), Incompatible Answer Set Detection (choices mismatched with question), and Incompatible Audio Question Detection (question irrelevant to audio). The benchmark aims to measure model reliability and promote development of more robust audio-aware large language models. The work is published on arXiv under identifier 2601.12248.

Key facts

AQUA-Bench addresses unanswerable questions in audio QA.
Three scenarios: Absent Answer Detection, Incompatible Answer Set Detection, Incompatible Audio Question Detection.
Existing benchmarks overlook unanswerable questions.
Real-world questions can be misleading or ill-posed.
Benchmark evaluates model reliability.
Published on arXiv:2601.12248.
Focuses on audio-aware large language models.
Promotes development of robust audio QA systems.

AQUA-Bench: Benchmarking Unanswerable Questions in Audio QA

Key facts

Entities

Institutions

Sources