RPC-Bench: Benchmarking Research Paper Comprehension in AI

ai-technology · 2026-05-01

Researchers have unveiled RPC-Bench, a comprehensive benchmark for question-answering aimed at assessing the understanding of foundation models regarding research articles. Derived from high-quality computer science paper review-rebuttal dialogues, it features 15,000 QA pairs verified by humans. The benchmark employs a detailed taxonomy that aligns with the scientific research process to evaluate models on questions of why, what, and how. A framework for annotating LLM-human interactions facilitates extensive labeling and ensures quality control. The evaluation utilizes the LLM-as-a-Judge approach, measuring models based on correctness, completeness, and conciseness, demonstrating a strong correlation with human assessments. Experiments indicate that even the top-performing models face challenges with this task.

Key facts

RPC-Bench is a benchmark for research paper comprehension
Built from review-rebuttal exchanges of computer science papers
Contains 15,000 human-verified QA pairs
Uses a fine-grained taxonomy aligned with scientific research flow
Assesses why, what, and how questions
Employs LLM-human interaction annotation framework
Evaluates on correctness-completeness and conciseness
Even strong models perform poorly on this benchmark

RPC-Bench: Benchmarking Research Paper Comprehension in AI

Key facts

Entities

Institutions

Sources