Backdoor Attack Exploits Ranking in World Model Planning

ai-technology · 2026-05-06

A new study from arXiv (2605.01950) reveals that world models, which use internal imagination for long-horizon planning, are vulnerable to a novel backdoor attack called TRAP. Unlike traditional attacks that target local features or one-step predictions, TRAP exploits the long-tailed ranking structure of imagined trajectories. By disrupting the ordering of a few decision-critical trajectories, the attack can systematically hijack planning. This vulnerability is distinct because world models' learned dynamics and planning processes can absorb shallow perturbations, making them resistant to conventional backdoor methods. The research highlights a new security risk in AI agents that rely on world models for decision-making.

Key facts

arXiv paper 2605.01950 introduces TRAP backdoor attack
TRAP targets world models used for long-horizon planning
Attack exploits long-tailed ranking structure of imagined trajectories
Disrupting ordering of decision-critical trajectories hijacks planning
World models can absorb shallow perturbations, resisting traditional attacks
Vulnerability is distinct from local feature or one-step prediction attacks
Research highlights new security risk for generalist AI agents
Study published on arXiv with cross type announcement

Backdoor Attack Exploits Ranking in World Model Planning

Key facts

Entities

Institutions

Sources