LLM Agents Struggle with Unclear Instructions; New Framework Prompts Clarification
A new study from arXiv (2409.00557) reveals that large language models (LLMs) equipped with function-calling capabilities struggle when user instructions are imprecise. Researchers analyzed real-world user queries, identified error patterns, and built Noisy ToolBench (NoisyToolBench), a benchmark for evaluating LLM tool-use under imperfect instructions. They found that due to next-token prediction training, LLMs tend to arbitrarily generate missing arguments, leading to hallucinations and risks. To address this, the team proposed Ask-when-Needed (AwN), a framework that prompts LLMs to ask users clarifying questions when instructions are unclear, rather than guessing.
Key facts
- arXiv:2409.00557v4
- Noisy ToolBench (NoisyToolBench) benchmark created
- LLMs arbitrarily generate missing arguments due to next-token prediction
- Ask-when-Needed (AwN) framework proposed
- Study focuses on LLM tool-use under imperfect instructions
- Real-world user instructions were examined
- Error patterns in LLM tool execution analyzed
- AwN prompts LLMs to ask users for clarification
Entities
Institutions
- arXiv