ARTFEED — Contemporary Art Intelligence

ECC: Evidence-Calibrated Query Clustering for LLM Evaluation

ai-technology · 2026-05-20

A novel algorithm known as ECC (Evidence-Calibrated Clustering) enhances the assessment of large language models (LLMs) by categorizing queries according to their latent capability requirements instead of merely their surface semantics. Traditional clustering techniques depend on semantic taxonomies or embeddings, which frequently do not align with the actual performance of the models. ECC refines existing semantic embeddings through limited comparisons of posterior models, effectively connecting surface semantics with genuine capability needs. Each cluster is defined by a capability profile, modeled by a Bradley-Terry framework, incorporating trainable mixture weights to address queries with varying demands. This method concurrently develops a flexible, capability-aware clustering framework that enables query-specific inference of LLM abilities. Comprehensive quantitative and qualitative analyses validate ECC's effectiveness.

Key facts

  • ECC stands for Evidence-Calibrated Clustering
  • It addresses misalignment between surface-level semantics and actual model performance in query clustering
  • Uses Bradley-Terry model for capability profiles
  • Incorporates trainable mixture weights for mixed capability demands
  • Supports query-specific inference of LLM capabilities
  • Evaluated through quantitative and qualitative methods
  • Published on arXiv with ID 2605.17110
  • Aims to enable capability-aware LLM evaluation

Entities

Institutions

  • arXiv

Sources