ARTFEED — Contemporary Art Intelligence

COMPASS: AI Framework for Safer LLM Search Agents

ai-technology · 2026-06-01

Researchers propose COMPASS, a Cognitive MCTS-Guided Process Alignment framework, to address safety degradation in LLM-powered search agents caused by harmful intents decomposing into innocuous sub-queries. COMPASS integrates cognitive tree exploration (CTE) to synthesize stealthy attack trajectories and introspective step-wise alignment (ISA) for fine-grained process supervision. Empirical results show a favorable safety-utility trade-off with reduced training data. The framework targets robust safety alignment throughout multi-step agent workflows.

Key facts

  • COMPASS stands for Cognitive MCTS-Guided Process Alignment for Safe Search Agents.
  • LLM-powered search agents enable multi-step reasoning and tool use.
  • Retrieval-induced safety degradation occurs when harmful intents decompose into seemingly innocuous sub-queries.
  • Existing alignment methods struggle with sparse safety signals and diverse violations.
  • COMPASS uses cognitive tree exploration (CTE) to synthesize stealthy attack trajectories.
  • COMPASS uses introspective step-wise alignment (ISA) to isolate risky intermediate actions.
  • Empirical results show a favorable safety-utility trade-off.
  • COMPASS requires substantially less training data than existing methods.

Entities

Institutions

  • arXiv

Sources