ARTFEED — Contemporary Art Intelligence

ZipRerank: Efficient Listwise Multimodal Reranking for Long Documents

ai-technology · 2026-05-13

A team of researchers has introduced ZipRerank, an innovative listwise multimodal reranker designed to overcome computational challenges in vision-focused retrieval and multimodal retrieval-augmented generation (M-RAG) for lengthy documents. By utilizing a lightweight early interaction mechanism between queries and images, ZipRerank effectively reduces input length and avoids autoregressive decoding by evaluating all candidates in a single forward pass. The training process consists of two stages: initial listwise pretraining on extensive text data converted into images, followed by multimodal fine-tuning using VLM-teacher-distilled soft-ranking supervision. Tests conducted on the MMDocIR benchmark indicate that ZipRerank either matches or exceeds the performance of leading models while enhancing efficiency.

Key facts

  • ZipRerank is a listwise multimodal reranker for long documents.
  • It reduces input length via query-image early interaction.
  • It eliminates autoregressive decoding by scoring all candidates in a single forward pass.
  • Training uses two-stage strategy: listwise pretraining on text-as-images and multimodal finetuning with VLM-teacher distillation.
  • Evaluated on MMDocIR benchmark.
  • Matches or surpasses state-of-the-art performance.
  • Addresses bottlenecks in vision-centric retrieval and M-RAG.
  • Proposed by researchers in a paper on arXiv.

Entities

Institutions

  • arXiv

Sources