COMPASS: AI Framework for Safer LLM Search Agents
Researchers propose COMPASS, a Cognitive MCTS-Guided Process Alignment framework, to address safety degradation in LLM-powered search agents caused by harmful intents decomposing into innocuous sub-queries. COMPASS integrates cognitive tree exploration (CTE) to synthesize stealthy attack trajectories and introspective step-wise alignment (ISA) for fine-grained process supervision. Empirical results show a favorable safety-utility trade-off with reduced training data. The framework targets robust safety alignment throughout multi-step agent workflows.
Key facts
- COMPASS stands for Cognitive MCTS-Guided Process Alignment for Safe Search Agents.
- LLM-powered search agents enable multi-step reasoning and tool use.
- Retrieval-induced safety degradation occurs when harmful intents decompose into seemingly innocuous sub-queries.
- Existing alignment methods struggle with sparse safety signals and diverse violations.
- COMPASS uses cognitive tree exploration (CTE) to synthesize stealthy attack trajectories.
- COMPASS uses introspective step-wise alignment (ISA) to isolate risky intermediate actions.
- Empirical results show a favorable safety-utility trade-off.
- COMPASS requires substantially less training data than existing methods.
Entities
Institutions
- arXiv