New Korean Speech Benchmarks for Evaluating Speech Language Models
Researchers propose KVoiceBench, KOpenAudioBench, and KMMAU, three agent-driven Korean speech benchmarks for evaluating SpeechLMs. Current SpeechLM evaluation is English-centric, and direct transfer via ASR, translation, and TTS corrupts language-specific features. The frameworks transfer English SpokenQA to Korean and convert Korean ASR corpora into audio understanding benchmarks. The benchmarks aim to address limitations in evaluating multilingual speech capabilities.
Key facts
- Three Korean speech benchmarks proposed: KVoiceBench, KOpenAudioBench, KMMAU
- Benchmarks are agent-driven for evaluating SpeechLMs
- Current SpeechLM evaluation is heavily centered on English
- Direct benchmark transfer via ASR, translation, normalization, TTS corrupts language-specific instructions
- Two human-agent benchmark-construction frameworks proposed
- One framework transfers source-language SpokenQA benchmarks into target-language SpokenQA benchmarks
- Other framework converts target-language ASR corpora into audio understanding benchmarks
- Frameworks use transcriptions and speaker metadata
Entities
—