AutoResearchBench: A Benchmark for AI-Driven Scientific Literature Discovery

ai-technology · 2026-04-30

AutoResearchBench is a new benchmark designed to evaluate AI agents' ability to autonomously discover scientific literature. It comprises two task types: Deep Research, which involves locating a specific target paper through iterative probing, and Wide Research, which requires collecting a comprehensive set of papers meeting given criteria. Unlike prior web browsing benchmarks, AutoResearchBench emphasizes research-oriented tasks demanding deep comprehension of scientific concepts. The benchmark aims to advance autonomous scientific research by testing AI agents' capability in navigating complex literature landscapes.

Key facts

AutoResearchBench is a benchmark for autonomous scientific literature discovery.
It includes Deep Research and Wide Research tasks.
Deep Research requires tracking down a specific target paper via multi-step probing.
Wide Research involves collecting a set of papers satisfying given conditions.
The benchmark is research-oriented, focusing on in-depth comprehension of scientific concepts.
It distinguishes itself from previous agentic web browsing benchmarks.
The goal is to advance autonomous scientific research.
The benchmark assesses AI agents' capability in scientific literature discovery.

Entities

—

Sources

arXiv cs.AI — 2026-04-29