Data Augmentation Framework Improves Dysarthric Speech Assessment

other · 2026-05-04

Researchers propose a three-stage framework for dysarthric speech quality assessment (DSQA) that uses unlabeled dysarthric speech and large-scale typical speech datasets. A teacher model generates pseudo-labels, followed by weakly supervised pretraining with label-aware contrastive learning, then fine-tuning for DSQA. Experiments on five unseen datasets show robustness, with a Whisper-based baseline outperforming SOTA predictors like SpICE.

Key facts

Framework uses unlabeled dysarthric speech and typical speech datasets
Teacher model generates pseudo-labels for unlabeled samples
Weakly supervised pretraining uses label-aware contrastive learning
Fine-tuned for downstream DSQA task
Tested on five unseen datasets across multiple etiologies and languages
Whisper-based baseline outperforms SpICE and other SOTA predictors

Entities

—

Sources

arXiv cs.AI — 2026-05-04