Knowing-Doing Gap in LLM Tool Use: Model-Adaptive Necessity

ai-technology · 2026-05-16

A recent study published on arXiv (2605.14038) presents a model-adaptive approach to defining tool necessity for large language models (LLMs), uncovering a notable gap between knowledge and action. Previous research viewed tool necessity as uniform across models; however, the authors contend that the capabilities of different models vary significantly. Consequently, a robust model might solve a problem without tools, whereas a weaker one may still require them. They assess tool necessity based on empirical performance and compare it to actual tool usage across four models in arithmetic and factual QA datasets. Findings reveal considerable discrepancies: 26.5-54.0% in arithmetic and 30.8-41.8% in factual QA. The study identifies issues in adaptive tool usage, emphasizing that models frequently either neglect to utilize tools when necessary or use them inappropriately. This research highlights the need for tailored tool-use strategies for each model and illustrates the disconnect between recognizing the need for tools and their actual application.

Key facts

Tool necessity is model-adaptive, not model-agnostic.
Mismatches of 26.5-54.0% on arithmetic tasks.
Mismatches of 30.8-41.8% on factual QA tasks.
Four models were tested.
Study uses empirical performance to define necessity.
Prior work treated tool necessity as model-agnostic.
Capability boundaries diverge across models.
Failure modes include unnecessary tool calls and missed necessary calls.

Knowing-Doing Gap in LLM Tool Use: Model-Adaptive Necessity

Key facts

Entities

Institutions

Sources