Confidence-Aware Training Boosts Medical ASR for Dravidian Languages

other · 2026-04-24

A new confidence-aware training framework improves automatic speech recognition (ASR) for low-resource Dravidian languages Telugu and Kannada in medical domains. The approach integrates real and synthetic speech via a hybrid confidence mechanism combining static perceptual/acoustic similarity metrics with dynamic model entropy. Two aggregation strategies—fixed-weight and learnable-weight—guide sample weighting during training. Evaluation on medical datasets with real recordings and TTS-generated speech, plus a 5-gram KenLM language model for post-decoding correction, shows performance gains.

Key facts

Focus on Telugu and Kannada languages
Medical domain ASR
Hybrid confidence mechanism with static and dynamic metrics
Fixed-weight and learnable-weight aggregation strategies
Evaluation on real and TTS-generated synthetic speech
5-gram KenLM language model for post-decoding correction
Addresses limited annotated data and morphological complexity
Proposed framework outperforms direct fine-tuning

Entities

—

Sources

arXiv cs.AI — 2026-04-23