DeepTrap Framework Exposes Contextual Vulnerabilities in OpenClaw Agent Systems
A new automated system called DeepTrap has been developed to identify vulnerabilities in language models used in OpenClaw. This research, detailed in the arXiv paper 2605.11047, focuses on security risks tied to various execution contexts like files, memory, and tools, which go beyond just user prompts. DeepTrap treats adversarial context manipulation as a challenge of optimizing black-box trajectories, striving to balance risk, task integrity, and stealth. The framework employs risk-based evaluations and multi-objective scoring to identify compromised contexts. A benchmark of 42 examples across six vulnerability types and seven operational scenarios was created, testing nine models using attack and utility scores. The results indicate that contextual compromises could lead to unsafe actions while still appearing functional to users.
Key facts
- DeepTrap is an automated framework for discovering contextual vulnerabilities in OpenClaw.
- The framework addresses security risks in agentic language-model systems with mutable execution contexts.
- Adversarial context manipulation is treated as a black-box trajectory-level optimization problem.
- Three objectives are balanced: risk realization, benign-task preservation, and stealth.
- Techniques include risk-conditioned evaluation, multi-objective trajectory scoring, reward-guided beam search, and reflection-based deep probing.
- A 42-case benchmark covers six vulnerability classes and seven operational scenarios.
- Nine target models were evaluated using attack and utility grading scores.
- Contextual compromise can induce substantial unsafe behavior while preserving user-facing functionality.
- The paper is published on arXiv with ID 2605.11047.
- The research highlights security risks beyond explicit user prompts.
Entities
Institutions
- arXiv