Whispered Speech Speaker Verification Improved by 22%
A new model enhances speaker verification for whispered speech, achieving a 22.26% relative improvement over baseline in normal vs whispered trials. The system uses an encoder-decoder structure on a fine-tuned speaker verification backbone, optimized with cosine similarity classification and triplet loss. It reaches an AUC of 98.16% in normal vs whispered tests. Whispered speech degrades standard verification due to different acoustic characteristics, but this approach addresses real-life scenarios like privacy protection or vocal impairment. The research is published on arXiv (2604.20229).
Key facts
- Relative improvement of 22.26% over baseline
- Baseline error rate 6.77% vs proposed 5.27%
- AUC of 98.16% in normal vs whispered trials
- Encoder-decoder structure on fine-tuned speaker verification backbone
- Optimized with cosine similarity classification and triplet loss
- Whispered speech differs acoustically from phonated speech
- Applications include privacy protection and vocal impairment
- Published on arXiv with ID 2604.20229
Entities
Institutions
- arXiv