E3: Automated Research Critique with Issue-Level Backtesting

other · 2026-05-27

E3 serves as an automated review assistant aimed at enhancing the work of reviewers and engineering teams by pinpointing key technical issues in research papers. It details each concern's type, location, relevance to the contribution, and the necessary analysis or evidence for resolution, addressing unsupported claims, absent ablations, inadequate baselines, hidden assumptions, validity threats, and leakage risks. To assess E3 without introducing contamination biases, a backtesting protocol focused on issue-level analysis is implemented: the dataset is limited to papers published after the training cutoff of all automated sources. A meta-judge, who only sees anonymized reviews, categorizes each issue-source pair as Caught, Partial, or Missed. This method was tested on 100 ICLR 2026 papers and 4,598 evaluated issue rows, comparing E3 to human reviews from ICLR and two prompt-matched LLM baselines based on OpenAI's gpt-5.4 and Anthropic's claude-opus-4-6.

Key facts

E3 is an automated review assistant for research papers.
It identifies technical concerns like unsupported claims, missing ablations, weak baselines, hidden assumptions, threats to validity, and leakage risks.
Evaluation uses an issue-level backtesting protocol to avoid contamination.
The corpus includes papers postdating training cutoffs of automated sources.
A meta-judge labels issue-source pairs as Caught, Partial, or Missed.
Tested on 100 ICLR 2026 papers with 4598 judged issue rows.
Compared against ICLR human reviews and LLM baselines (gpt-5.4, claude-opus-4-6).
The paper is available on arXiv with ID 2605.27072.

E3: Automated Research Critique with Issue-Level Backtesting

Key facts

Entities

Institutions

Sources