ARTFEED — Contemporary Art Intelligence

New Benchmark Reveals LLM Search Agents Rely on Internal Knowledge

ai-technology · 2026-05-28

A study from arXiv (2605.28721) introduces LiveBrowseComp, a deep-search benchmark designed to evaluate whether LLM-based search agents genuinely discover new information or merely verify what they already know. The researchers diagnose Intrinsic Knowledge Dependence (IKD), showing agents answer up to 44.5% of BrowseComp questions without using tools, generate over half of search queries from internal hypotheses, and perform worse than closed-book baselines when supporting evidence is removed. These findings indicate static search benchmarks may reward memory-backed verification over evidence-driven discovery. LiveBrowseComp aims to assess agents beyond intrinsic coverage.

Key facts

  • Study published on arXiv with ID 2605.28721.
  • Introduces concept of Intrinsic Knowledge Dependence (IKD).
  • Agents answer up to 44.5% of BrowseComp questions without tools.
  • More than half of search queries generated from internal hypotheses.
  • Agents perform worse than closed-book baselines when evidence is removed.
  • Static search benchmarks may conflate known information with discoverable information.
  • LiveBrowseComp is a new deep-search benchmark.
  • Benchmark designed to evaluate agents beyond intrinsic coverage.

Entities

Institutions

  • arXiv

Sources