ARTFEED — Contemporary Art Intelligence

CriterAlign: New Framework for Code Preference Judging

other · 2026-05-20

CriterAlign is an innovative framework that modifies rubric-based LLM evaluation for pairwise code preference assessment. It incorporates direct judgments at the criterion level, refines criteria based on ties, employs swap-consistency filtering, and culminates in a final pairwise synthesis. This method resolves the discrepancies between pointwise scoring and pairwise preference predictions. Furthermore, the framework introduces Human-Preference-Aligned Guidance (HPAG), which is created offline from training samples by identifying common rationale patterns.

Key facts

  • Pairwise human preference prediction is central to evaluating code-generation systems.
  • Existing rubric-based LLM judges are pointwise, scoring each response independently.
  • Pointwise design is poorly matched to pairwise code preference prediction.
  • CriterAlign uses direct criterion-level pairwise judgments.
  • CriterAlign includes tie-driven criterion refinement.
  • CriterAlign employs swap-consistency filtering.
  • CriterAlign performs final pairwise synthesis.
  • HPAG is synthesized offline from training examples.

Entities

Institutions

  • arXiv

Sources