ARTFEED — Contemporary Art Intelligence

Study Compares 14 Representations of Retrieved Content in RAG Pipelines

other · 2026-06-01

A new study from arXiv (2605.30790) systematically compares how different representations of retrieved documents affect large language model (LLM) performance in retrieval-augmented generation (RAG) pipelines. The researchers held retrieval fixed and varied only the representation of retrieved documents, testing 14 transformations including selection, summarisation, and reformulation, in both query-dependent and query-independent variants. They measured question-answering accuracy across these representations, addressing the gap in understanding which features of a document's representation matter most when the consumer is an LLM rather than a human. The work builds on prior research that examined single transformations in isolation, providing a controlled comparison to identify the most impactful representation strategies.

Key facts

  • Study compares 14 representations of retrieved documents in RAG pipelines
  • Held retrieval fixed, varied only representation
  • Transformations include selection, summarisation, reformulation
  • Tested query-dependent and query-independent variants
  • Measured question-answering accuracy
  • Addresses gap in understanding LLM-specific content representation
  • Builds on prior isolated studies of single transformations
  • Published on arXiv with ID 2605.30790

Entities

Institutions

  • arXiv

Sources