Trust Calibration for AI Tool Use Formalized as Preference Learning

ai-technology · 2026-05-20

A recent study published on arXiv defines trust calibration in the context of agentic tool usage as a problem of preference learning. The authors introduce a policy gateway that sustains a Gaussian-process posterior reflecting a latent human risk-tolerance function, which is gauged through binary feedback of approval or denial. The system seeks human consent precisely when the outcome is most ambiguous. This approach exemplifies Preferential Bayesian Optimization, utilizing its inference framework and efficiency in sampling, yet it diverges in its aim: categorizing an action space into allow/block/ask zones instead of optimizing a design. The work falls under the category of Computer Science > Artificial Intelligence.

Key facts

Formalizes trust calibration for agentic tool use as preference learning
Policy gateway uses Gaussian-process posterior over human risk-tolerance
Observes binary approve/deny feedback via probit likelihood
Escalates to human where approval outcome is most uncertain
Structurally an instance of Preferential Bayesian Optimization
Inherits approximate Gaussian-process classification and uncertainty-targeted querying
Objective is to classify action space into allow/block/ask regions
Paper is on arXiv under Computer Science > Artificial Intelligence

Trust Calibration for AI Tool Use Formalized as Preference Learning

Key facts

Entities

Institutions

Sources