Utterance-Level Methods Boost ASR Reliability for Child Speech

ai-technology · 2026-04-24

A new study from arXiv (2604.19801) introduces two utterance-level methods to identify reliable automatic speech recognition (ASR) output for child speech, targeting read and dialogue material separately. Evaluated on English and Dutch datasets using baseline and finetuned models, the best strategy achieves precision above 97.4% for both languages and speech types. The optimal approach can automatically select 21.0% to 55.9% of dialogue or read speech datasets, mitigating high ASR error rates in applications like language learning and literacy acquisition.

Key facts

arXiv paper 2604.19801 proposes utterance-level ASR reliability selection for child speech.
Two methods: one for read speech, one for dialogue speech.
Evaluated on English and Dutch datasets with baseline and finetuned models.
Best strategy precision > 97.4% for both languages and speech types.
Optimal strategy selects 21.0% to 55.9% of dialogue/read speech datasets.
Aims to improve ASR-dependent applications for children.
High ASR error rates limit effectiveness in language learning and literacy.
Utterance-level selection identifies reliable ASR output in advance.

Utterance-Level Methods Boost ASR Reliability for Child Speech

Key facts

Entities

Institutions

Sources