ARTFEED — Contemporary Art Intelligence

EditPropBench: Benchmarking LLM Fact Propagation in Scientific Manuscripts

other · 2026-05-07

A new benchmark called EditPropBench measures whether LLM editors can propagate local factual edits through dependent claims in scientific manuscripts. An audit of recent arXiv cs.CL papers found fact-dependent qualitative claims in 37.2% of papers, indicating this pattern is common. The benchmark uses synthetic ML/NLP manuscripts with targeted edits and controlled fact graphs, tracking cascade success via Edit-Ripple Adherence (ERA).

Key facts

  • EditPropBench is introduced as a benchmark for measuring factual edit propagation by LLMs.
  • An audit of arXiv cs.CL papers found 37.2% contain fact-dependent qualitative claims.
  • Each benchmark item includes a synthetic manuscript, a targeted edit, and a fact graph.
  • The fact graph has sentence-level labels for direct targets, required downstream updates, and unrelated text.
  • Cascade success is summarized with Edit-Ripple Adherence (ERA).
  • Local factual edits often create non-local revision obligations.
  • Example: changing dataset size from 215 to 80 documents may stale claims like 'medium-scale'.
  • The benchmark focuses on ML/NLP-style manuscripts.

Entities

Institutions

  • arXiv

Sources