Global PSRO: A New Algorithm for Equilibrium Computation in Large Zero-Sum Games
Researchers propose Global Policy-Space Response Oracles (Global PSRO), a novel algorithm that improves equilibrium computation in large zero-sum games. The standard PSRO framework iteratively expands a restricted strategy set using deep reinforcement learning, but existing variants often expand inefficiently by relying on best responses to meta-strategies computed from restricted-game payoffs. Global PSRO introduces a two-phase exploration-selection framework that directly minimizes Population Exploitability (PE), a measure of how well the restricted set represents the full game. This approach guides population expansion by evaluating post-expansion quality, leading to more efficient strategy sets under limited computational budgets. The paper is published on arXiv with ID 2605.28273.
Key facts
- arXiv:2605.28273v1
- Announce Type: new
- PSRO framework scales equilibrium computation to large zero-sum games
- PSRO iteratively expands a restricted strategy set using deep reinforcement learning
- Existing PSRO variants expand using best responses to meta-strategies
- Global PSRO uses Population Exploitability (PE) to measure restricted set quality
- Global PSRO introduces a two-phase exploration-selection framework
- Global PSRO explicitly minimizes PE during expansion
Entities
Institutions
- arXiv