ARTFEED — Contemporary Art Intelligence

PrefixGuard: LLM Agent Failure-Warning Monitors

ai-technology · 2026-05-09

PrefixGuard is a framework for training lightweight monitors that warn of LLM agent failures mid-task, using offline StepView induction and supervised learning. It achieves up to 0.900 AUPRC on WebArena, outperforming raw-text controls by +0.137 AUPRC on average.

Key facts

  • PrefixGuard is a trace-to-monitor framework for LLM agents.
  • It uses an offline StepView induction step followed by supervised monitor training.
  • StepView induces deterministic typed-step adapters from raw trace samples.
  • The monitor learns an event abstraction and prefix-risk scorer from terminal outcomes.
  • Tested on WebArena, τ²-Bench, SkillsBench, and TerminalBench.
  • Strongest PrefixGuard monitors reach 0.900/0.710/0.533/0.557 AUPRC.
  • Improves over raw-text controls by an average of +0.137 AUPRC.
  • LLM judges are substantially weaker under the same prefix-warning protocol.

Entities

Sources