ARTFEED — Contemporary Art Intelligence

MARA Framework Introduces Query-Adaptive Mechanisms for Multimodal Document Question Answering

ai-technology · 2026-04-22

The Multimodal Adaptive Retrieval-Augmented (MARA) framework tackles shortcomings in multimodal document question answering that relies on retrieval. Existing methods utilize query-agnostic document representations, overlooking important content and relying on static top-k evidence selection, which does not adapt well to uncertain information distributions. MARA introduces mechanisms that adapt to queries for both retrieval and generation. It features a Query-Aligned Region Encoder that creates multi-level document representations and adjusts them based on their relevance to the query, enhancing retrieval accuracy. Additionally, the framework incorporates a Self-Re... (truncated in source). This study was published on arXiv under the identifier 2604.16313v1 as a cross announcement. Retrieval-based multimodal document QA seeks to extract and combine pertinent information from complex, visually rich documents. Although retrieval-augmented generation (RAG) has excelled in text-based QA, its application to multimodal documents is still largely unexamined.

Key facts

  • The Multimodal Adaptive Retrieval-Augmented (MARA) framework is proposed for multimodal document question answering.
  • Current approaches rely on query-agnostic document representations that overlook salient content.
  • Static top-k evidence selection fails to adapt to the uncertain distribution of relevant information.
  • MARA introduces query-adaptive mechanisms to both retrieval and generation.
  • The framework includes a Query-Aligned Region Encoder that builds multi-level document representations.
  • Representations are reweighted based on query relevance to improve retrieval precision.
  • The research was announced on arXiv with identifier 2604.16313v1.
  • Retrieval-augmented generation (RAG) has shown strong performance in text-based QA but extensions to multimodal documents are underexplored.

Entities

Institutions

  • arXiv

Sources