Study Reveals Skepticism Shift: Humans Increasingly Distrust Real Audio After Deepfake Exposure
A comprehensive investigation into the perception of audio deepfakes, released on arXiv, indicates a notable 'skepticism shift.' While participants' accuracy in identifying fake audio remained relatively unchanged (from 72.9% to 71.2%), their confidence in genuine speech significantly declined (from 72.7% to 64.1%). The research gathered 35,532 assessments from 1,768 individuals, evaluating 138 different text-to-speech and voice conversion technologies. It was found that commercial and autoregressive language models were the most challenging to identify (with accuracy between 61.3% and 65.9%), whereas traditional seq2seq and flow-matching models were more easily recognized (with accuracy ranging from 75.4% to 76.8%). Additionally, a machine learning detector achieved over 94.5% accuracy in all scenarios, highlighting that deepfakes diminish trust in real audio rather than merely affecting detection skills.
Key facts
- 35,532 judgments from 1,768 participants
- 138 text-to-speech and voice conversion systems tested
- Human accuracy on fake samples: 72.9% (2021 baseline) to 71.2% (current)
- Human accuracy on real samples dropped from 72.7% to 64.1%
- Commercial and autoregressive language model systems hardest to detect (61.3-65.9%)
- Traditional seq2seq and flow-matching models easier to spot (75.4-76.8%)
- ML detector maintained over 94.5% accuracy
- Study published on arXiv (2605.26136)
Entities
Institutions
- arXiv