New Research Proposes Personalized LLM Benchmarks Based on Individual User Preferences

ai-technology · 2026-04-22

A new research paper argues that current methods for evaluating large language models fail to account for individual user preferences. Published on arXiv under identifier 2604.18943v1, the study demonstrates that personalized model rankings diverge significantly from aggregate benchmarks. Researchers analyzed 115 active Chatbot Arena users, employing both ELO ratings and Bradley-Terry coefficients to compute personalized rankings. Their analysis examined how user query characteristics—including topics and writing style—relate to variations in LLM performance rankings. The findings reveal that Bradley-Terry correlations between individual and aggregate rankings average only ρ = 0.04, with 57% of users showing near-zero or negative correlation. This research emerges as LLM capabilities increase and models are deployed for real-world tasks, making alignment with human preferences a critical challenge. Current evaluation benchmarks typically average preferences across all users to establish model rankings, overlooking the diverse needs of individual users in different contexts. The paper calls for the development of personalized LLM benchmarks that rank models according to specific individual requirements rather than generalized aggregate ratings.

Key facts

Research paper arXiv:2604.18943v1 proposes personalized LLM benchmarks
Study analyzes 115 active Chatbot Arena users
Uses ELO ratings and Bradley-Terry coefficients for personalized rankings
Finds average Bradley-Terry correlation of ρ = 0.04 between individual and aggregate rankings
57% of users show near-zero or negative correlation with aggregate rankings
Examines how query topics and writing style affect LLM ranking variations
Argues current benchmarks overlook individual user preferences
Calls for benchmarks that rank models according to individual needs

New Research Proposes Personalized LLM Benchmarks Based on Individual User Preferences

Key facts

Entities

Institutions

Sources