COMPASS: AI Framework for Safer LLM Search Agents

ai-technology · 2026-06-01

Researchers propose COMPASS, a Cognitive MCTS-Guided Process Alignment framework, to address safety degradation in LLM-powered search agents caused by harmful intents decomposing into innocuous sub-queries. COMPASS integrates cognitive tree exploration (CTE) to synthesize stealthy attack trajectories and introspective step-wise alignment (ISA) for fine-grained process supervision. Empirical results show a favorable safety-utility trade-off with reduced training data. The framework targets robust safety alignment throughout multi-step agent workflows.

Key facts

COMPASS stands for Cognitive MCTS-Guided Process Alignment for Safe Search Agents.
LLM-powered search agents enable multi-step reasoning and tool use.
Retrieval-induced safety degradation occurs when harmful intents decompose into seemingly innocuous sub-queries.
Existing alignment methods struggle with sparse safety signals and diverse violations.
COMPASS uses cognitive tree exploration (CTE) to synthesize stealthy attack trajectories.
COMPASS uses introspective step-wise alignment (ISA) to isolate risky intermediate actions.
Empirical results show a favorable safety-utility trade-off.
COMPASS requires substantially less training data than existing methods.

COMPASS: AI Framework for Safer LLM Search Agents

Key facts

Entities

Institutions

Sources