Study: Humans Poor at Detecting Fully Synthetic Speech
A new study on arXiv (2605.28064) investigates how well people can spot synthetic speech in a social context. The research involved 47 participants who took on a task to identify synthetic parts in different types of speech: authentic, completely synthetic, and partially synthetic. They were influenced by three different trust cues: how instructions were framed, emotional context, and labeling of the speech source. The results showed that people struggled to accurately identify fully synthetic speech, often guessing incorrectly. While the trust cues didn’t have a strong overall impact, they did affect how participants detected the speech. Moreover, perceptions of quality, like mechanicalness and clarity, changed depending on the type of utterance.
Key facts
- 47 participants completed a localization task
- Three trust cues: instructional framing, affective priming, provenance labeling
- Fully synthetic speech detected at below-chance levels
- Utterance class was primary determinant of detection accuracy
- Trust cues produced no main effects but motivated detection behavior
- Quality ratings tracked utterance type
- Study published on arXiv with ID 2605.28064
- Investigated voice deepfake detection as perceptual and contextual process
Entities
Institutions
- arXiv