Metacognitive Probe Diagnoses LLM Confidence Calibration Across Five Dimensions
A new diagnostic tool called the Metacognitive Probe has been developed by researchers to assess the confidence behavior of large language models across five key areas: confidence calibration, epistemic vigilance, knowledge boundary, calibration range, and reasoning-chain validation. This five-task probe, which draws inspiration from Flavell (1979) and Nelson and Narens (1990), was evaluated on eight advanced models and 69 human subjects, emphasizing observable confidence-correctness alignment. It is important to note that this tool is not a validated cross-species metacognition scale, and a pre-defined hypothesis regarding human development was disproven. Existing benchmarks like MMLU, BIG-Bench, HELM, and GPQA measure correct responses but do not indicate whether a model recognizes incorrect answers, allowing for potential overconfidence in specific areas despite an overall high score.
Key facts
- The Metacognitive Probe is a five-task, 15-slot diagnostic.
- It decomposes LLM confidence into five dimensions: T1-CC, T2-EV, T3-KB, T4-CR, T5-RCV.
- Evaluated on N=8 frontier models and N=69 humans.
- Motivated by Flavell (1979) and Nelson and Narens (1990).
- The instrument is not a validated cross-species metacognition scale.
- A pre-specified human developmental hypothesis was falsified.
- Composite benchmarks (MMLU, BIG-Bench, HELM, GPQA) are silent on model's awareness of its errors.
- A model can score 80 on a composite calibration benchmark yet be overconfident in narrow pockets.
Entities
—