LLMs Fail When Surface Cues Override Hidden Constraints

ai-technology · 2026-04-24

A recent investigation indicates that large language models consistently struggle when a prominent surface cue contradicts an implicit feasibility constraint. Through a causal-behavioral examination of the 'car wash problem' involving six models, researchers discovered approximately context-independent sigmoid heuristics: the influence of the distance cue is 8.7 to 38 times greater than that of the goal, and token-level attribution reveals patterns resembling keyword associations more than compositional inference. The Heuristic Override Benchmark (HOB) encompasses 500 instances across 4 heuristic and 5 constraint families with minimal pairs and explicitness gradients, showing generality in 14 models: under rigorous evaluation (10/10 correct), no model surpasses 75%, with presence constraints being the most challenging (44%). A slight hint (e.g., highlighting the key object) improves performance by +15 pp on average, indicating that the issue lies in constraint inference rather than reasoning capabilities.

Key facts

LLMs fail when surface cues conflict with feasibility constraints
Causal-behavioral analysis of 'car wash problem' across six models
Distance cue exerts 8.7 to 38 times more influence than goal
Token-level attribution shows keyword associations over compositional inference
Heuristic Override Benchmark (HOB) includes 500 instances
HOB spans 4 heuristic by 5 constraint families
No model exceeds 75% under strict evaluation (10/10 correct)
Presence constraints are hardest at 44% accuracy
Minimal hint recovers +15 percentage points on average

Entities

—

Sources

arXiv cs.AI — 2026-04-23