LLMs Fail When Surface Cues Override Hidden Constraints
A recent investigation indicates that large language models consistently struggle when a prominent surface cue contradicts an implicit feasibility constraint. Through a causal-behavioral examination of the 'car wash problem' involving six models, researchers discovered approximately context-independent sigmoid heuristics: the influence of the distance cue is 8.7 to 38 times greater than that of the goal, and token-level attribution reveals patterns resembling keyword associations more than compositional inference. The Heuristic Override Benchmark (HOB) encompasses 500 instances across 4 heuristic and 5 constraint families with minimal pairs and explicitness gradients, showing generality in 14 models: under rigorous evaluation (10/10 correct), no model surpasses 75%, with presence constraints being the most challenging (44%). A slight hint (e.g., highlighting the key object) improves performance by +15 pp on average, indicating that the issue lies in constraint inference rather than reasoning capabilities.
Key facts
- LLMs fail when surface cues conflict with feasibility constraints
- Causal-behavioral analysis of 'car wash problem' across six models
- Distance cue exerts 8.7 to 38 times more influence than goal
- Token-level attribution shows keyword associations over compositional inference
- Heuristic Override Benchmark (HOB) includes 500 instances
- HOB spans 4 heuristic by 5 constraint families
- No model exceeds 75% under strict evaluation (10/10 correct)
- Presence constraints are hardest at 44% accuracy
- Minimal hint recovers +15 percentage points on average
Entities
—