ECC: Evidence-Calibrated Query Clustering for LLM Evaluation

ai-technology · 2026-05-20

A novel algorithm known as ECC (Evidence-Calibrated Clustering) enhances the assessment of large language models (LLMs) by categorizing queries according to their latent capability requirements instead of merely their surface semantics. Traditional clustering techniques depend on semantic taxonomies or embeddings, which frequently do not align with the actual performance of the models. ECC refines existing semantic embeddings through limited comparisons of posterior models, effectively connecting surface semantics with genuine capability needs. Each cluster is defined by a capability profile, modeled by a Bradley-Terry framework, incorporating trainable mixture weights to address queries with varying demands. This method concurrently develops a flexible, capability-aware clustering framework that enables query-specific inference of LLM abilities. Comprehensive quantitative and qualitative analyses validate ECC's effectiveness.

Key facts

ECC stands for Evidence-Calibrated Clustering
It addresses misalignment between surface-level semantics and actual model performance in query clustering
Uses Bradley-Terry model for capability profiles
Incorporates trainable mixture weights for mixed capability demands
Supports query-specific inference of LLM capabilities
Evaluated through quantitative and qualitative methods
Published on arXiv with ID 2605.17110
Aims to enable capability-aware LLM evaluation

ECC: Evidence-Calibrated Query Clustering for LLM Evaluation

Key facts

Entities

Institutions

Sources