LLM Agents Exhibit Intrinsic Over-Calling Bias in Tool Use
A recent study published on arXiv (2605.18882) indicates that LLM agents frequently over-utilize tools, even when it is not required. In the When2Call benchmark, six models spanning three families demonstrated strong call accuracy but significantly lower no-call accuracy, resulting in overall accuracy rates between 55% and 70%. The researchers introduced the Intrinsic Bias Hypothesis (IBH), which posits that the mapping of call/no-call decisions includes an activation-independent call offset, leading to a preference for calling even when activations are equal. By employing Sparse Autoencoders (SAEs), they identified behavior-aligned feature bases for decision-making, reduced them to a signed activation margin, and directly estimated the offset. The findings showed that decision neutrality occurred in all six models only when no-call activation surpassed call activation, aligning with IBH. The team further tested IBH causally using Adaptive Margin-Calibrated Steering (AMCS), a method to counteract bias along SAE decoder directions. Addressing the identified offset reduced the tendency to over-call.
Key facts
- LLM agents over-call tools even when unnecessary
- When2Call benchmark used for evaluation
- Six models from three families tested
- Overall accuracy ranges 55%-70%
- Intrinsic Bias Hypothesis (IBH) proposed
- Sparse Autoencoders (SAEs) used to analyze decision features
- Adaptive Margin-Calibrated Steering (AMCS) developed to counter bias
- Decision-neutral only when no-call activation outweighs call activation
Entities
Institutions
- arXiv