New Benchmark Reveals LLM Search Agents Rely on Internal Knowledge

ai-technology · 2026-05-28

A study from arXiv (2605.28721) introduces LiveBrowseComp, a deep-search benchmark designed to evaluate whether LLM-based search agents genuinely discover new information or merely verify what they already know. The researchers diagnose Intrinsic Knowledge Dependence (IKD), showing agents answer up to 44.5% of BrowseComp questions without using tools, generate over half of search queries from internal hypotheses, and perform worse than closed-book baselines when supporting evidence is removed. These findings indicate static search benchmarks may reward memory-backed verification over evidence-driven discovery. LiveBrowseComp aims to assess agents beyond intrinsic coverage.

Key facts

Study published on arXiv with ID 2605.28721.
Introduces concept of Intrinsic Knowledge Dependence (IKD).
Agents answer up to 44.5% of BrowseComp questions without tools.
More than half of search queries generated from internal hypotheses.
Agents perform worse than closed-book baselines when evidence is removed.
Static search benchmarks may conflate known information with discoverable information.
LiveBrowseComp is a new deep-search benchmark.
Benchmark designed to evaluate agents beyond intrinsic coverage.

New Benchmark Reveals LLM Search Agents Rely on Internal Knowledge

Key facts

Entities

Institutions

Sources