ARTFEED — Contemporary Art Intelligence

AutoResearchBench: A Benchmark for AI-Driven Scientific Literature Discovery

ai-technology · 2026-04-30

AutoResearchBench is a new benchmark designed to evaluate AI agents' ability to autonomously discover scientific literature. It comprises two task types: Deep Research, which involves locating a specific target paper through iterative probing, and Wide Research, which requires collecting a comprehensive set of papers meeting given criteria. Unlike prior web browsing benchmarks, AutoResearchBench emphasizes research-oriented tasks demanding deep comprehension of scientific concepts. The benchmark aims to advance autonomous scientific research by testing AI agents' capability in navigating complex literature landscapes.

Key facts

  • AutoResearchBench is a benchmark for autonomous scientific literature discovery.
  • It includes Deep Research and Wide Research tasks.
  • Deep Research requires tracking down a specific target paper via multi-step probing.
  • Wide Research involves collecting a set of papers satisfying given conditions.
  • The benchmark is research-oriented, focusing on in-depth comprehension of scientific concepts.
  • It distinguishes itself from previous agentic web browsing benchmarks.
  • The goal is to advance autonomous scientific research.
  • The benchmark assesses AI agents' capability in scientific literature discovery.

Entities

Sources