Adversarial Action Removal in Self-Play Reinforcement Learning

other · 2026-05-20

A recent study published on arXiv presents the concept of adversarial action masking in self-play reinforcement learning, where an attacker strategically eliminates certain legal actions from the action set of a victim. This method differs from traditional perturbations, as it removes options before the agent makes a decision. Experiments conducted in poker, ranging from 6 to 5,531 information states, along with two non-poker scenarios, reveal that learned masking inflicts significantly more harm than random masking and learned perturbation benchmarks. The attack is effective against various victims, including Q-learning, PPO, NFSP, neural NFSP, and DQN; it transfers between agents, is intensified through self-play, and shows no signs of recovery with prolonged masked training. The adversary focuses on high-value decision points, indicated by reach-weighted contingent action capacity (CAC_w) and value-weighted refinement (CAC_v), highlighting action availability as a unique robustness aspect in self-play RL.

Key facts

Adversarial action masking removes legal actions before the agent acts.
Experiments used poker games with 6 to 5,531 information states.
Learned masking outperforms random masking and perturbation baselines.
Attack persists across Q-learning, PPO, NFSP, neural NFSP, and DQN.
Attack transfers across agents and is amplified by self-play.
No recovery observed under extended masked training.
Adversary targets high-value decision points measured by CAC_w and CAC_v.
Action availability is identified as a distinct robustness surface.

Adversarial Action Removal in Self-Play Reinforcement Learning

Key facts

Entities

Institutions

Sources