AI Self-Play Collapse Threshold Identified in Decision Capacity Study
A new study on arXiv has pinpointed a crucial structural limit in decision-making that can lead to collapse in self-play reinforcement learning. Researchers found that if all decisions that allow for positive outcomes are eliminated, the system quickly shifts toward a predictable outcome, resulting in almost complete failure. Interestingly, keeping just one of those positive decision points can avert this collapse. This phenomenon was observed in various games, such as poker, matrix games, a dice game, and multiple learning algorithms. Further analysis suggests that this effect stems from co-adaptation under constraints, not from disturbances. The study highlights a critical threshold in decision-making capacity, with important implications for AI safety and understanding behaviors in multi-agent systems.
Key facts
- Threshold in decision capacity determines collapse in self-play reinforcement learning
- Eliminating all positive-reach contingent decisions causes rapid convergence to deterministic exploitation attractor
- Preserving even one positive-reach contingent decision point prevents collapse
- Tested across poker variants, matrix games, dice game, and multiple learning algorithms
- Frozen baseline and fixed-opponent control confirm mechanism is co-adaptation under constraint
- Phenomenon is timing-invariant and fully reversible upon action restoration
- Intensifies under function approximation
- Establishes sharp threshold at zero reach-weighted contingent action capacity
Entities
Institutions
- arXiv