ARTFEED — Contemporary Art Intelligence

Auto-Discovery-Bench: Benchmark for Structured State Tracking in Oracle-Guided Discovery

other · 2026-06-01

The Auto-Discovery-Bench is a novel standard designed to assess how well agents can sustain and revise structured beliefs in the context of interactive discovery. This benchmark employs a deterministic oracle-guided framework, allowing agents to uncover concealed structures via cycles of hypothesis, intervention, and feedback. It encompasses three types of discovery: directed graph, undirected relational, and symbolic equation discovery. Results indicate that performance declines with an increase in variables, extended trajectories, and additional distractors. A diagnostic focused on trajectory tracking shows that issues continue to occur even when intervention selection and hypothesis generation are excluded, highlighting challenges in sustaining and integrating long-range structured states.

Key facts

  • Auto-Discovery-Bench is a deterministic oracle-guided diagnostic benchmark.
  • It involves repeated hypothesis-intervention-feedback cycles.
  • Three discovery abstractions: directed graph, undirected relational, symbolic equation.
  • Performance degrades with more variables, longer trajectories, and more distractors.
  • A trajectory-tracking diagnostic isolates state tracking from other capabilities.
  • Failures persist even without intervention selection and hypothesis generation.
  • Limitations are in maintaining and integrating long-range structured state.
  • The paper is on arXiv with ID 2502.15224.

Entities

Institutions

  • arXiv

Sources