Multi-Agent RL Method Retains Suboptimal Actions for Shifting Optima

other · 2026-05-22

A novel approach for cooperative multi-agent reinforcement learning (MARL), called Successive Sub-value Q-learning (S2Q), has been introduced by researchers. Unlike traditional value decomposition methods that focus on a single best action, S2Q develops several sub-value functions, allowing it to maintain access to alternative high-value actions. This flexibility helps the algorithm adjust to shifts in the underlying value function during training, preventing it from settling on suboptimal policies. By integrating these sub-value functions into a Softmax-based behavior policy, S2Q promotes continuous exploration and quick adaptation to changing optima. Tests on demanding MARL benchmarks demonstrate that S2Q consistently surpasses a range of MARL algorithms in terms of adaptability and performance. The code is available to the public.

Key facts

S2Q stands for Successive Sub-value Q-learning
S2Q learns multiple sub-value functions
Sub-value functions retain alternative high-value actions
S2Q uses a Softmax-based behavior policy
S2Q addresses shifting value functions in MARL
S2Q outperforms various MARL algorithms on benchmarks
Code is available at the provided URL
Research is in computer science and artificial intelligence

Multi-Agent RL Method Retains Suboptimal Actions for Shifting Optima

Key facts

Entities

Institutions

Sources