UnIte: Uncertainty-Based Document Sampling Boosts Domain Adaptation in IR

other · 2026-04-30

A new method called UnIte (Uncertainty-based Iterative Document Sampling) improves unsupervised domain adaptation for neural information retrievers. The approach filters documents with high aleatoric uncertainty and prioritizes those with high epistemic uncertainty to maximize learning utility. Experiments on the BEIR corpus with small and large models show significant gains of +2.45 and +3.49 nDCG@10 using an average of only 4k training samples. The method addresses limitations of existing diversity-focused sampling by incorporating model uncertainty.

Key facts

UnIte stands for Uncertainty-based Iterative Document Sampling.
It addresses limitations of existing document sampling methods that focus on diversity but fail to capture model uncertainty.
The method filters documents with high aleatoric uncertainty and prioritizes those with high epistemic uncertainty.
Experiments were conducted on the BEIR corpus with small and large models.
Gains of +2.45 and +3.49 nDCG@10 were achieved with a smaller training sample size of 4k on average.
The work is in the field of unsupervised domain adaptation for neural retrievers.
The method generates pseudo queries on target domain documents.
The paper is available on arXiv under Computer Science > Information Retrieval.

UnIte: Uncertainty-Based Document Sampling Boosts Domain Adaptation in IR

Key facts

Entities

Institutions

Sources