Study: Humans Poor at Detecting Fully Synthetic Speech

other · 2026-05-28

A new study on arXiv (2605.28064) investigates how well people can spot synthetic speech in a social context. The research involved 47 participants who took on a task to identify synthetic parts in different types of speech: authentic, completely synthetic, and partially synthetic. They were influenced by three different trust cues: how instructions were framed, emotional context, and labeling of the speech source. The results showed that people struggled to accurately identify fully synthetic speech, often guessing incorrectly. While the trust cues didn’t have a strong overall impact, they did affect how participants detected the speech. Moreover, perceptions of quality, like mechanicalness and clarity, changed depending on the type of utterance.

Key facts

47 participants completed a localization task
Three trust cues: instructional framing, affective priming, provenance labeling
Fully synthetic speech detected at below-chance levels
Utterance class was primary determinant of detection accuracy
Trust cues produced no main effects but motivated detection behavior
Quality ratings tracked utterance type
Study published on arXiv with ID 2605.28064
Investigated voice deepfake detection as perceptual and contextual process

Study: Humans Poor at Detecting Fully Synthetic Speech

Key facts

Entities

Institutions

Sources