PrefixGuard: LLM Agent Failure-Warning Monitors

ai-technology · 2026-05-09

PrefixGuard is a framework for training lightweight monitors that warn of LLM agent failures mid-task, using offline StepView induction and supervised learning. It achieves up to 0.900 AUPRC on WebArena, outperforming raw-text controls by +0.137 AUPRC on average.

Key facts

PrefixGuard is a trace-to-monitor framework for LLM agents.
It uses an offline StepView induction step followed by supervised monitor training.
StepView induces deterministic typed-step adapters from raw trace samples.
The monitor learns an event abstraction and prefix-risk scorer from terminal outcomes.
Tested on WebArena, τ²-Bench, SkillsBench, and TerminalBench.
Strongest PrefixGuard monitors reach 0.900/0.710/0.533/0.557 AUPRC.
Improves over raw-text controls by an average of +0.137 AUPRC.
LLM judges are substantially weaker under the same prefix-warning protocol.

Entities

—

Sources

arXiv cs.AI — 2026-05-09