ARTFEED — Contemporary Art Intelligence

Trajectory Proper Score for Agentic Uncertainty Quantification

other · 2026-05-26

A new scoring rule, the Trajectory Proper Score (TPS), has been introduced for evaluating uncertainty quantification in language-model agents. Existing methods like AUROC, AUPRC, risk-coverage, Trajectory ECE, and scalarized trajectory scores conflate ranking usefulness with probabilistic truthfulness. TPS is a predictor-agnostic family of strictly proper trajectory-level scoring rules that elicit the full prefix-conditioned success-probability trace. It is proven to strictly elicit the success-probability process under complete observation, and the construction extends to administratively censored trajectories. The work builds on prequential proper scoring and is detailed in arXiv:2605.24756.

Key facts

  • TPS is a family of strictly proper trajectory-level scoring rules.
  • Existing methods like AUROC, AUPRC, risk-coverage, Trajectory ECE, and scalarized scores are criticized.
  • TPS elicits the success-probability trace q_t = P^π(Y=1 | H_t).
  • TPS is predictor-agnostic.
  • TPS is proven strictly proper under complete observation.
  • Extension to administratively censored trajectories is provided.
  • The work is based on prequential proper scoring.
  • The paper is available on arXiv with ID 2605.24756.

Entities

Institutions

  • arXiv

Sources