Stepwise Confidence Attribution for Diagnosing LLM Reasoning Failures

ai-technology · 2026-05-20

A new framework called Stepwise Confidence Attribution (SCA) diagnoses where multi-step reasoning fails in black-box large language models (LLMs) by assigning confidence to each step based solely on generated reasoning traces. SCA applies the Information Bottleneck (IB) principle: steps aligning with consensus across correct solutions get high confidence, deviations are flagged. Two methods are proposed: NIBS (non-parametric IB without graph structures) and GIBS (graph-based IB learning subgraphs via differentiable mask). The approach works for closed-source LLMs without internal access, addressing limitations of existing methods that only estimate confidence for final answers or require model internals.

Key facts

SCA diagnoses multi-step reasoning failures in black-box LLMs
Assigns step-level confidence based only on generated reasoning traces
Applies Information Bottleneck principle
Steps aligning with consensus across correct solutions receive high confidence
Deviations are flagged as potentially erroneous
Two methods: NIBS (non-parametric) and GIBS (graph-based)
Works for closed-source LLMs without internal model access
Existing methods are restricted to final answers or require internal access

Stepwise Confidence Attribution for Diagnosing LLM Reasoning Failures

Key facts

Entities

Institutions

Sources