Whispered Speech Speaker Verification Improved by 22%

other · 2026-04-24

A new model enhances speaker verification for whispered speech, achieving a 22.26% relative improvement over baseline in normal vs whispered trials. The system uses an encoder-decoder structure on a fine-tuned speaker verification backbone, optimized with cosine similarity classification and triplet loss. It reaches an AUC of 98.16% in normal vs whispered tests. Whispered speech degrades standard verification due to different acoustic characteristics, but this approach addresses real-life scenarios like privacy protection or vocal impairment. The research is published on arXiv (2604.20229).

Key facts

Relative improvement of 22.26% over baseline
Baseline error rate 6.77% vs proposed 5.27%
AUC of 98.16% in normal vs whispered trials
Encoder-decoder structure on fine-tuned speaker verification backbone
Optimized with cosine similarity classification and triplet loss
Whispered speech differs acoustically from phonated speech
Applications include privacy protection and vocal impairment
Published on arXiv with ID 2604.20229

Whispered Speech Speaker Verification Improved by 22%

Key facts

Entities

Institutions

Sources