LLM Agents Struggle with Unclear Instructions; New Framework Prompts Clarification

ai-technology · 2026-04-30

A new study from arXiv (2409.00557) reveals that large language models (LLMs) equipped with function-calling capabilities struggle when user instructions are imprecise. Researchers analyzed real-world user queries, identified error patterns, and built Noisy ToolBench (NoisyToolBench), a benchmark for evaluating LLM tool-use under imperfect instructions. They found that due to next-token prediction training, LLMs tend to arbitrarily generate missing arguments, leading to hallucinations and risks. To address this, the team proposed Ask-when-Needed (AwN), a framework that prompts LLMs to ask users clarifying questions when instructions are unclear, rather than guessing.

Key facts

arXiv:2409.00557v4
Noisy ToolBench (NoisyToolBench) benchmark created
LLMs arbitrarily generate missing arguments due to next-token prediction
Ask-when-Needed (AwN) framework proposed
Study focuses on LLM tool-use under imperfect instructions
Real-world user instructions were examined
Error patterns in LLM tool execution analyzed
AwN prompts LLMs to ask users for clarification

LLM Agents Struggle with Unclear Instructions; New Framework Prompts Clarification

Key facts

Entities

Institutions

Sources