ARTFEED — Contemporary Art Intelligence

Factual Consistency Metrics Fail for Long-Document Summarization

other · 2026-04-30

An analysis of six reference-free metrics for factuality indicates their inadequacy in summarizing lengthy documents. This research, available on arXiv (2511.07689v2), evaluates metrics intended for short summaries against seven types of factuality-preserving alterations: paraphrasing, simplification, synonym substitution, logically equivalent negations, vocabulary reduction, compression, and insertion of source text. Testing on three long-form benchmark datasets (science fiction, legal, scientific) reveals variable scores, underscoring the difficulties posed by input length constraints and long-range dependencies. The study examines the robustness of these metrics concerning retrieval context and claim information density, ultimately finding that metrics designed for short-form summaries yield inconsistent outcomes for longer texts.

Key facts

  • Six reference-free factuality metrics were evaluated.
  • Seven factuality-preserving perturbations were applied.
  • Three long-form benchmark datasets were used: science fiction, legal, scientific.
  • Metrics originally proposed for short-form summarization.
  • Perturbations include paraphrasing, simplification, synonym replacement, logically equivalent negations, vocabulary reduction, compression, and source text insertion.
  • Results show inconsistent scores for long documents.
  • Study probes sensitivity to retrieval context and claim information density.
  • Published on arXiv with ID 2511.07689v2.

Entities

Institutions

  • arXiv

Sources