Plasticity Interventions Reduce Backdoor Threats in DRL, Except SAM
A new study from arXiv (2605.14587) systematically investigates how plasticity interventions—built-in components of modern deep reinforcement learning (DRL) agents—affect backdoor attack vulnerabilities. Analyzing 14,664 cases, researchers found that only the Sharpness-Aware Minimization (SAM) intervention exacerbates backdoor threats due to backdoor gradient amplification. All other interventions mitigate threats by disrupting activation pathways and compressing representation space. The work highlights a critical gap in prior research, which focused on vanilla scenarios without plasticity interventions, posing risks in practical DRL deployments.
Key facts
- arXiv paper 2605.14587 investigates plasticity interventions in DRL backdoor attacks.
- 14,664 cases were empirically studied combining interventions and attack scenarios.
- Only SAM intervention exacerbates backdoor threats.
- Other interventions mitigate backdoor threats.
- Exacerbation is attributed to backdoor gradient amplification.
- Mitigation stems from activation pathway disruption and representation space compression.
- Plasticity interventions are built-in components of modern DRL agents.
- Prior studies focused on vanilla scenarios without plasticity interventions.
Entities
Institutions
- arXiv