EditPropBench: Benchmarking LLM Fact Propagation in Scientific Manuscripts
A new benchmark called EditPropBench measures whether LLM editors can propagate local factual edits through dependent claims in scientific manuscripts. An audit of recent arXiv cs.CL papers found fact-dependent qualitative claims in 37.2% of papers, indicating this pattern is common. The benchmark uses synthetic ML/NLP manuscripts with targeted edits and controlled fact graphs, tracking cascade success via Edit-Ripple Adherence (ERA).
Key facts
- EditPropBench is introduced as a benchmark for measuring factual edit propagation by LLMs.
- An audit of arXiv cs.CL papers found 37.2% contain fact-dependent qualitative claims.
- Each benchmark item includes a synthetic manuscript, a targeted edit, and a fact graph.
- The fact graph has sentence-level labels for direct targets, required downstream updates, and unrelated text.
- Cascade success is summarized with Edit-Ripple Adherence (ERA).
- Local factual edits often create non-local revision obligations.
- Example: changing dataset size from 215 to 80 documents may stale claims like 'medium-scale'.
- The benchmark focuses on ML/NLP-style manuscripts.
Entities
Institutions
- arXiv