CL-MARL: Adaptive Curriculum Learning for Multi-Agent Reinforcement Learning
A recent publication on arXiv presents CL-MARL, an innovative curriculum learning framework tailored for multi-agent reinforcement learning (MARL) that tackles the issue of environmental meta-stationarity—where agents are typically trained at a constant level of difficulty. The authors contend that this approach limits policy generalization and results in superficial local optima. CL-MARL modifies opponent strength in real-time based on win-rate feedback, adjusting task difficulty as agents improve. Its scheduler, FlexDiff, integrates momentum-based trend analysis with dual-curve monitoring of training and evaluation outcomes to ensure smooth difficulty adjustments without the need for manual calibration. To address the non-stationarity and sparse global rewards from a shifting curriculum, the paper introduces Counterfactual Group Relative Policy Advantage (CGRPA), enhancing existing advantage estimation techniques. This research is available on arXiv with the identifier 2506.07548.
Key facts
- Paper introduces CL-MARL, a dynamic curriculum learning framework for MARL
- Addresses environmental meta-stationarity: static-difficulty training regime
- CL-MARL adapts opponent strength online from win-rate signals
- FlexDiff scheduler fuses momentum-based trend estimation with sliding-window dual-curve monitoring
- CGRPA extends advantage estimation to handle non-stationarity and sparse rewards
- Published on arXiv with ID 2506.07548
- Type: replace (updated version)
- Focus on cooperative tasks against scripted adversaries
Entities
Institutions
- arXiv