ARTFEED — Contemporary Art Intelligence

AI Models Learn to Predict Research Success via Comparative Evaluation

ai-technology · 2026-05-23

Researchers from arXiv preprint 2605.21491 investigate whether language models can forecast the empirical success of research ideas without prior experimentation. They introduce comparative empirical forecasting: given a benchmark goal and two candidate ideas, predict which yields better performance. A dataset of 11,488 idea pairs was constructed from PapersWithCode outcomes. Off-the-shelf 8B-parameter models achieved only 30% accuracy, but supervised fine-tuning (SFT) boosted performance to 77.1%, surpassing GPT-5's 61.1%. Using reinforcement learning with verifiable rewards (RLVR), models reached 71.35% accuracy with interpretable justifications. The study addresses a bottleneck in AI-driven research: evaluating numerous generated ideas efficiently.

Key facts

  • Study focuses on comparative empirical forecasting of research ideas.
  • Dataset includes 11,488 idea pairs from PapersWithCode.
  • Off-the-shelf 8B-parameter models achieve 30% accuracy.
  • SFT improves accuracy to 77.1%.
  • GPT-5 achieves 61.1% accuracy.
  • RLVR yields 71.35% accuracy with interpretable justifications.
  • Research addresses bottleneck in evaluating AI-generated ideas.
  • Preprint is arXiv:2605.21491.

Entities

Institutions

  • arXiv
  • PapersWithCode

Sources