PrefixGuard: LLM Agent Failure-Warning Monitors
PrefixGuard is a framework for training lightweight monitors that warn of LLM agent failures mid-task, using offline StepView induction and supervised learning. It achieves up to 0.900 AUPRC on WebArena, outperforming raw-text controls by +0.137 AUPRC on average.
Key facts
- PrefixGuard is a trace-to-monitor framework for LLM agents.
- It uses an offline StepView induction step followed by supervised monitor training.
- StepView induces deterministic typed-step adapters from raw trace samples.
- The monitor learns an event abstraction and prefix-risk scorer from terminal outcomes.
- Tested on WebArena, τ²-Bench, SkillsBench, and TerminalBench.
- Strongest PrefixGuard monitors reach 0.900/0.710/0.533/0.557 AUPRC.
- Improves over raw-text controls by an average of +0.137 AUPRC.
- LLM judges are substantially weaker under the same prefix-warning protocol.
Entities
—