New Benchmark Reveals LLM Search Agents Rely on Internal Knowledge
A study from arXiv (2605.28721) introduces LiveBrowseComp, a deep-search benchmark designed to evaluate whether LLM-based search agents genuinely discover new information or merely verify what they already know. The researchers diagnose Intrinsic Knowledge Dependence (IKD), showing agents answer up to 44.5% of BrowseComp questions without using tools, generate over half of search queries from internal hypotheses, and perform worse than closed-book baselines when supporting evidence is removed. These findings indicate static search benchmarks may reward memory-backed verification over evidence-driven discovery. LiveBrowseComp aims to assess agents beyond intrinsic coverage.
Key facts
- Study published on arXiv with ID 2605.28721.
- Introduces concept of Intrinsic Knowledge Dependence (IKD).
- Agents answer up to 44.5% of BrowseComp questions without tools.
- More than half of search queries generated from internal hypotheses.
- Agents perform worse than closed-book baselines when evidence is removed.
- Static search benchmarks may conflate known information with discoverable information.
- LiveBrowseComp is a new deep-search benchmark.
- Benchmark designed to evaluate agents beyond intrinsic coverage.
Entities
Institutions
- arXiv