Stepwise Confidence Attribution for Diagnosing LLM Reasoning Failures
A new framework called Stepwise Confidence Attribution (SCA) diagnoses where multi-step reasoning fails in black-box large language models (LLMs) by assigning confidence to each step based solely on generated reasoning traces. SCA applies the Information Bottleneck (IB) principle: steps aligning with consensus across correct solutions get high confidence, deviations are flagged. Two methods are proposed: NIBS (non-parametric IB without graph structures) and GIBS (graph-based IB learning subgraphs via differentiable mask). The approach works for closed-source LLMs without internal access, addressing limitations of existing methods that only estimate confidence for final answers or require model internals.
Key facts
- SCA diagnoses multi-step reasoning failures in black-box LLMs
- Assigns step-level confidence based only on generated reasoning traces
- Applies Information Bottleneck principle
- Steps aligning with consensus across correct solutions receive high confidence
- Deviations are flagged as potentially erroneous
- Two methods: NIBS (non-parametric) and GIBS (graph-based)
- Works for closed-source LLMs without internal model access
- Existing methods are restricted to final answers or require internal access
Entities
Institutions
- arXiv