Active Learning Improves LLM-Based Pairwise Ranking

other · 2026-05-16

Researchers suggest reinterpreting Pairwise Ranking Prompting (PRP) for large language models (LLMs) as an active learning challenge stemming from noisy pairwise comparisons. Conventional PRP relies on sorting algorithms to compile pairwise preferences; however, the judgments can be noisy, sensitive to order, and intransitive, rendering sorting assumptions invalid. The proposed framework features a randomized-direction oracle that utilizes one LLM call for each pair, transforming systematic position bias into zero-mean noise. This approach allows for unbiased aggregate ranking without the need for bidirectional calls. Active rankers can seamlessly replace existing methods, enhancing NDCG@10 per call in scenarios with call limitations.

Key facts

PRP elicits pairwise preference judgments from an LLM
Judgments are noisy, order-sensitive, and sometimes intransitive
Sorting aims to recover a full permutation
Truncating sorting to meet a call budget does not produce dependable top-K
Active rankers are drop-in replacements for sorting
Active rankers improve NDCG@10 per call
Randomized-direction oracle uses a single LLM call per pair
Approach converts systematic position bias into zero-mean noise

Active Learning Improves LLM-Based Pairwise Ranking

Key facts

Entities

Institutions

Sources