Confidence-Aware Training Boosts Medical ASR for Dravidian Languages
A new confidence-aware training framework improves automatic speech recognition (ASR) for low-resource Dravidian languages Telugu and Kannada in medical domains. The approach integrates real and synthetic speech via a hybrid confidence mechanism combining static perceptual/acoustic similarity metrics with dynamic model entropy. Two aggregation strategies—fixed-weight and learnable-weight—guide sample weighting during training. Evaluation on medical datasets with real recordings and TTS-generated speech, plus a 5-gram KenLM language model for post-decoding correction, shows performance gains.
Key facts
- Focus on Telugu and Kannada languages
- Medical domain ASR
- Hybrid confidence mechanism with static and dynamic metrics
- Fixed-weight and learnable-weight aggregation strategies
- Evaluation on real and TTS-generated synthetic speech
- 5-gram KenLM language model for post-decoding correction
- Addresses limited annotated data and morphological complexity
- Proposed framework outperforms direct fine-tuning
Entities
—