CL-MARL: Adaptive Curriculum Learning for Multi-Agent Reinforcement Learning

other · 2026-05-07

A recent publication on arXiv presents CL-MARL, an innovative curriculum learning framework tailored for multi-agent reinforcement learning (MARL) that tackles the issue of environmental meta-stationarity—where agents are typically trained at a constant level of difficulty. The authors contend that this approach limits policy generalization and results in superficial local optima. CL-MARL modifies opponent strength in real-time based on win-rate feedback, adjusting task difficulty as agents improve. Its scheduler, FlexDiff, integrates momentum-based trend analysis with dual-curve monitoring of training and evaluation outcomes to ensure smooth difficulty adjustments without the need for manual calibration. To address the non-stationarity and sparse global rewards from a shifting curriculum, the paper introduces Counterfactual Group Relative Policy Advantage (CGRPA), enhancing existing advantage estimation techniques. This research is available on arXiv with the identifier 2506.07548.

Key facts

Paper introduces CL-MARL, a dynamic curriculum learning framework for MARL
Addresses environmental meta-stationarity: static-difficulty training regime
CL-MARL adapts opponent strength online from win-rate signals
FlexDiff scheduler fuses momentum-based trend estimation with sliding-window dual-curve monitoring
CGRPA extends advantage estimation to handle non-stationarity and sparse rewards
Published on arXiv with ID 2506.07548
Type: replace (updated version)
Focus on cooperative tasks against scripted adversaries

CL-MARL: Adaptive Curriculum Learning for Multi-Agent Reinforcement Learning

Key facts

Entities

Institutions

Sources