ZipRerank: Efficient Listwise Multimodal Reranking for Long Documents

ai-technology · 2026-05-13

A team of researchers has introduced ZipRerank, an innovative listwise multimodal reranker designed to overcome computational challenges in vision-focused retrieval and multimodal retrieval-augmented generation (M-RAG) for lengthy documents. By utilizing a lightweight early interaction mechanism between queries and images, ZipRerank effectively reduces input length and avoids autoregressive decoding by evaluating all candidates in a single forward pass. The training process consists of two stages: initial listwise pretraining on extensive text data converted into images, followed by multimodal fine-tuning using VLM-teacher-distilled soft-ranking supervision. Tests conducted on the MMDocIR benchmark indicate that ZipRerank either matches or exceeds the performance of leading models while enhancing efficiency.

Key facts

ZipRerank is a listwise multimodal reranker for long documents.
It reduces input length via query-image early interaction.
It eliminates autoregressive decoding by scoring all candidates in a single forward pass.
Training uses two-stage strategy: listwise pretraining on text-as-images and multimodal finetuning with VLM-teacher distillation.
Evaluated on MMDocIR benchmark.
Matches or surpasses state-of-the-art performance.
Addresses bottlenecks in vision-centric retrieval and M-RAG.
Proposed by researchers in a paper on arXiv.

ZipRerank: Efficient Listwise Multimodal Reranking for Long Documents

Key facts

Entities

Institutions

Sources