New Framework Quantifies Self-Preference Bias in LLM Judges
A new automated framework quantifies and mitigates Self-Preference Bias (SPB) in LLM-as-a-Judge systems. SPB causes LLMs to favor their own outputs during evaluation, distorting model alignment and leaderboards. Existing methods rely on costly human annotations and conflate generative ability with evaluative stance. The proposed framework constructs equal-quality response pairs to statistically separate discriminability from bias without human gold standards. Empirical analysis across 20 models validates the approach.
Key facts
- LLM-as-a-Judge systems are used for model alignment, leaderboard construction, and quality control.
- Self-Preference Bias (SPB) is a directional evaluative deviation where LLMs favor their own outputs.
- Existing SPB measurements rely on costly human annotations.
- The new framework is fully automated and does not require human gold standards.
- It constructs equal-quality response pairs to disentangle discriminability from bias propensity.
- Empirical analysis was conducted across 20 models.
- The framework aims to improve scalability and trustworthiness of automated evaluation.
- The paper is available on arXiv with ID 2604.22891.
Entities
Institutions
- arXiv