New Korean Speech Benchmarks for Evaluating Speech Language Models

ai-technology · 2026-05-28

Researchers propose KVoiceBench, KOpenAudioBench, and KMMAU, three agent-driven Korean speech benchmarks for evaluating SpeechLMs. Current SpeechLM evaluation is English-centric, and direct transfer via ASR, translation, and TTS corrupts language-specific features. The frameworks transfer English SpokenQA to Korean and convert Korean ASR corpora into audio understanding benchmarks. The benchmarks aim to address limitations in evaluating multilingual speech capabilities.

Key facts

Three Korean speech benchmarks proposed: KVoiceBench, KOpenAudioBench, KMMAU
Benchmarks are agent-driven for evaluating SpeechLMs
Current SpeechLM evaluation is heavily centered on English
Direct benchmark transfer via ASR, translation, normalization, TTS corrupts language-specific instructions
Two human-agent benchmark-construction frameworks proposed
One framework transfers source-language SpokenQA benchmarks into target-language SpokenQA benchmarks
Other framework converts target-language ASR corpora into audio understanding benchmarks
Frameworks use transcriptions and speaker metadata

Entities

—

Sources

arXiv cs.AI — 2026-05-28