ARTFEED — Contemporary Art Intelligence

Model-Level Alignment Evaluation Insufficient for Deployment Claims

publication · 2026-05-07

A recent study published on arXiv contends that alignment relevant to deployment cannot solely be determined through model-level assessments. The authors suggest that alignment assertions should be categorized based on the evidence level: model, response, interaction, or deployment. An organized review of eleven alignment benchmarks, which was expanded to include a total of sixteen benchmarks and evaluated using an eight-dimension rubric (Cohen's kappa = 0.87), revealed a lack of user-facing verification support across all benchmarks analyzed, with minimal process steerability. The few interaction benchmarks found include tau-be. This paper questions the common practice of relying on model-level metrics to substantiate claims regarding alignment in deployed systems.

Key facts

  • Paper argues deployment-relevant alignment cannot be inferred from model-level evaluation alone.
  • Alignment claims should be indexed to model-level, response-level, interaction-level, or deployment-level.
  • Structured audit of 11 alignment benchmarks extended to 16-benchmark corpus.
  • Dual-coded against eight-dimension rubric with Cohen's kappa = 0.87.
  • User-facing verification support absent across every benchmark examined.
  • Process steerability nearly absent in benchmarks.
  • Few interactional benchmarks identified, including tau-be.
  • Paper challenges use of model-level scores for deployment claims.

Entities

Institutions

  • arXiv

Sources