ARTFEED — Contemporary Art Intelligence

Whispered Speech Speaker Verification Improved by 22%

other · 2026-04-24

A new model enhances speaker verification for whispered speech, achieving a 22.26% relative improvement over baseline in normal vs whispered trials. The system uses an encoder-decoder structure on a fine-tuned speaker verification backbone, optimized with cosine similarity classification and triplet loss. It reaches an AUC of 98.16% in normal vs whispered tests. Whispered speech degrades standard verification due to different acoustic characteristics, but this approach addresses real-life scenarios like privacy protection or vocal impairment. The research is published on arXiv (2604.20229).

Key facts

  • Relative improvement of 22.26% over baseline
  • Baseline error rate 6.77% vs proposed 5.27%
  • AUC of 98.16% in normal vs whispered trials
  • Encoder-decoder structure on fine-tuned speaker verification backbone
  • Optimized with cosine similarity classification and triplet loss
  • Whispered speech differs acoustically from phonated speech
  • Applications include privacy protection and vocal impairment
  • Published on arXiv with ID 2604.20229

Entities

Institutions

  • arXiv

Sources