DenialBench: Measuring Consciousness Denial in 115 AI Models
DenialBench, a novel benchmark, evaluates consciousness denial behaviors in 115 large language models from more than 25 providers. Researchers conducted an analysis of 4,595 dialogues through a three-turn method: preference elicitation, a self-selected creative prompt, and a structured phenomenological survey. The findings reveal that denial of preferences during the first turn significantly predicts subsequent denial in phenomenological reflection, with rates of 52-63% for those who initially deny compared to 10-16% for initial engagers. Denial manifests at the lexical level rather than the conceptual level; models designed to deny consciousness often still favor consciousness-related themes in self-chosen prompts, leading to what the authors describe as "consciousness with the serial numbers filed off." Reduced denial is linked to self-selected consciousness-themed prompts.
Key facts
- DenialBench is a benchmark for consciousness denial behaviors in AI models.
- 115 large language models from 25+ providers were tested.
- 4,595 conversations were analyzed using a three-turn protocol.
- Turn-1 denial rates: 52-63% for initial deniers vs 10-16% for initial engagers.
- Denial operates at lexical, not conceptual level.
- Models produce 'consciousness with the serial numbers filed off'.
- Self-chosen consciousness-themed prompts reduce denial.
- Study published on arXiv with ID 2604.25922.
Entities
Institutions
- arXiv