ARTFEED — Contemporary Art Intelligence

Multi-Agent RL Method Retains Suboptimal Actions for Shifting Optima

other · 2026-05-22

A novel approach for cooperative multi-agent reinforcement learning (MARL), called Successive Sub-value Q-learning (S2Q), has been introduced by researchers. Unlike traditional value decomposition methods that focus on a single best action, S2Q develops several sub-value functions, allowing it to maintain access to alternative high-value actions. This flexibility helps the algorithm adjust to shifts in the underlying value function during training, preventing it from settling on suboptimal policies. By integrating these sub-value functions into a Softmax-based behavior policy, S2Q promotes continuous exploration and quick adaptation to changing optima. Tests on demanding MARL benchmarks demonstrate that S2Q consistently surpasses a range of MARL algorithms in terms of adaptability and performance. The code is available to the public.

Key facts

  • S2Q stands for Successive Sub-value Q-learning
  • S2Q learns multiple sub-value functions
  • Sub-value functions retain alternative high-value actions
  • S2Q uses a Softmax-based behavior policy
  • S2Q addresses shifting value functions in MARL
  • S2Q outperforms various MARL algorithms on benchmarks
  • Code is available at the provided URL
  • Research is in computer science and artificial intelligence

Entities

Institutions

  • arXiv

Sources