EditPropBench: Benchmarking LLM Fact Propagation in Scientific Manuscripts

other · 2026-05-07

A new benchmark called EditPropBench measures whether LLM editors can propagate local factual edits through dependent claims in scientific manuscripts. An audit of recent arXiv cs.CL papers found fact-dependent qualitative claims in 37.2% of papers, indicating this pattern is common. The benchmark uses synthetic ML/NLP manuscripts with targeted edits and controlled fact graphs, tracking cascade success via Edit-Ripple Adherence (ERA).

Key facts

EditPropBench is introduced as a benchmark for measuring factual edit propagation by LLMs.
An audit of arXiv cs.CL papers found 37.2% contain fact-dependent qualitative claims.
Each benchmark item includes a synthetic manuscript, a targeted edit, and a fact graph.
The fact graph has sentence-level labels for direct targets, required downstream updates, and unrelated text.
Cascade success is summarized with Edit-Ripple Adherence (ERA).
Local factual edits often create non-local revision obligations.
Example: changing dataset size from 215 to 80 documents may stale claims like 'medium-scale'.
The benchmark focuses on ML/NLP-style manuscripts.

EditPropBench: Benchmarking LLM Fact Propagation in Scientific Manuscripts

Key facts

Entities

Institutions

Sources