Cooperative MARL Exploration Budget Allocation via Quality-Aware Scheduling

other · 2026-05-06

A novel framework for cooperative multi-agent reinforcement learning (MARL) tackles the issue of balancing exploration levels to prevent coordination failure or the lack of discovery of uncommon strategies. This method integrates a return-conditioned sigmoid schedule (RCB) for managing global intensity alongside a per-agent Reward Signal Quality (RSQ) metric, which prioritizes the exploration resources for agents that exhibit dependable intrinsic reward signals. This research has been made available on arXiv with the identifier 2605.01865.

Key facts

Cooperative MARL requires agents to discover joint strategies in a combinatorially large state-action space.
Effective coordination configurations are exceedingly rare.
Intrinsic motivation augments task rewards with novelty bonuses.
Exploration intensity β must be carefully tuned: too large overwhelms task signal, too small prevents discovery.
The framework addresses global β adaptation over training and per-agent budget allocation.
RCB (return-conditioned sigmoid schedule) controls global intensity.
RSQ (Reward Signal Quality) metric concentrates budget on agents with reliable signals.
Published on arXiv:2605.01865.

Cooperative MARL Exploration Budget Allocation via Quality-Aware Scheduling

Key facts

Entities

Institutions

Sources