Metacognitive Probe Diagnoses LLM Confidence Calibration Across Five Dimensions

ai-technology · 2026-05-12

A new diagnostic tool called the Metacognitive Probe has been developed by researchers to assess the confidence behavior of large language models across five key areas: confidence calibration, epistemic vigilance, knowledge boundary, calibration range, and reasoning-chain validation. This five-task probe, which draws inspiration from Flavell (1979) and Nelson and Narens (1990), was evaluated on eight advanced models and 69 human subjects, emphasizing observable confidence-correctness alignment. It is important to note that this tool is not a validated cross-species metacognition scale, and a pre-defined hypothesis regarding human development was disproven. Existing benchmarks like MMLU, BIG-Bench, HELM, and GPQA measure correct responses but do not indicate whether a model recognizes incorrect answers, allowing for potential overconfidence in specific areas despite an overall high score.

Key facts

The Metacognitive Probe is a five-task, 15-slot diagnostic.
It decomposes LLM confidence into five dimensions: T1-CC, T2-EV, T3-KB, T4-CR, T5-RCV.
Evaluated on N=8 frontier models and N=69 humans.
Motivated by Flavell (1979) and Nelson and Narens (1990).
The instrument is not a validated cross-species metacognition scale.
A pre-specified human developmental hypothesis was falsified.
Composite benchmarks (MMLU, BIG-Bench, HELM, GPQA) are silent on model's awareness of its errors.
A model can score 80 on a composite calibration benchmark yet be overconfident in narrow pockets.

Entities

—

Sources

arXiv cs.AI — 2026-05-12