ARTFEED — Contemporary Art Intelligence

RPC-Bench: Benchmarking Research Paper Comprehension in AI

ai-technology · 2026-05-01

Researchers have unveiled RPC-Bench, a comprehensive benchmark for question-answering aimed at assessing the understanding of foundation models regarding research articles. Derived from high-quality computer science paper review-rebuttal dialogues, it features 15,000 QA pairs verified by humans. The benchmark employs a detailed taxonomy that aligns with the scientific research process to evaluate models on questions of why, what, and how. A framework for annotating LLM-human interactions facilitates extensive labeling and ensures quality control. The evaluation utilizes the LLM-as-a-Judge approach, measuring models based on correctness, completeness, and conciseness, demonstrating a strong correlation with human assessments. Experiments indicate that even the top-performing models face challenges with this task.

Key facts

  • RPC-Bench is a benchmark for research paper comprehension
  • Built from review-rebuttal exchanges of computer science papers
  • Contains 15,000 human-verified QA pairs
  • Uses a fine-grained taxonomy aligned with scientific research flow
  • Assesses why, what, and how questions
  • Employs LLM-human interaction annotation framework
  • Evaluates on correctness-completeness and conciseness
  • Even strong models perform poorly on this benchmark

Entities

Institutions

  • arXiv

Sources