Backdoor Attack Exploits Ranking in World Model Planning
A new study from arXiv (2605.01950) reveals that world models, which use internal imagination for long-horizon planning, are vulnerable to a novel backdoor attack called TRAP. Unlike traditional attacks that target local features or one-step predictions, TRAP exploits the long-tailed ranking structure of imagined trajectories. By disrupting the ordering of a few decision-critical trajectories, the attack can systematically hijack planning. This vulnerability is distinct because world models' learned dynamics and planning processes can absorb shallow perturbations, making them resistant to conventional backdoor methods. The research highlights a new security risk in AI agents that rely on world models for decision-making.
Key facts
- arXiv paper 2605.01950 introduces TRAP backdoor attack
- TRAP targets world models used for long-horizon planning
- Attack exploits long-tailed ranking structure of imagined trajectories
- Disrupting ordering of decision-critical trajectories hijacks planning
- World models can absorb shallow perturbations, resisting traditional attacks
- Vulnerability is distinct from local feature or one-step prediction attacks
- Research highlights new security risk for generalist AI agents
- Study published on arXiv with cross type announcement
Entities
Institutions
- arXiv