Data Augmentation Framework Improves Dysarthric Speech Assessment
Researchers propose a three-stage framework for dysarthric speech quality assessment (DSQA) that uses unlabeled dysarthric speech and large-scale typical speech datasets. A teacher model generates pseudo-labels, followed by weakly supervised pretraining with label-aware contrastive learning, then fine-tuning for DSQA. Experiments on five unseen datasets show robustness, with a Whisper-based baseline outperforming SOTA predictors like SpICE.
Key facts
- Framework uses unlabeled dysarthric speech and typical speech datasets
- Teacher model generates pseudo-labels for unlabeled samples
- Weakly supervised pretraining uses label-aware contrastive learning
- Fine-tuned for downstream DSQA task
- Tested on five unseen datasets across multiple etiologies and languages
- Whisper-based baseline outperforms SpICE and other SOTA predictors
Entities
—