Large Reasoning Models Fail to Abstain When Information Is Insufficient

ai-technology · 2026-05-28

A recent study published on arXiv (2605.28070) uncovers a significant failure mode in large reasoning models: when confronted with questions that do not provide enough information, these models often recognize the lack of data but continue to reason, leading to unsupported responses rather than opting to abstain. The researchers term this issue the "detection-to-abstention gap." This gap is particularly concerning in critical areas like medical AI, where incomplete answers can be more detrimental than simply declining to respond. To mitigate this, the authors introduce Judge-Then-Solve (JTS), a reasoning-control framework that encourages models to commit to answerability prior to solution generation, treating abstention as a decision point. JTS is implemented through supervised learning, aiming to enhance the reliability of reasoning models in uncertain situations.

Key facts

arXiv paper 2605.28070 identifies a failure mode in large reasoning models under insufficient information.
Models detect insufficient information but still produce unsupported answers instead of abstaining.
The detection-to-abstention gap is the mismatch between detecting insufficiency and actually abstaining.
This gap is especially dangerous in high-risk domains like medical AI.
The proposed solution is Judge-Then-Solve (JTS), a trajectory-level reasoning-control framework.
JTS trains models to make an explicit answerability commitment before solution generation.
Abstention is treated as a control decision, not a final-answer style.
The model either solves or terminates early based on its answerability judgment.

Large Reasoning Models Fail to Abstain When Information Is Insufficient

Key facts

Entities

Institutions

Sources