ARTFEED — Contemporary Art Intelligence

ARMS: Automatic Reward Shaping for Sparse-Reward Multi-Agent RL

ai-technology · 2026-05-25

Researchers propose ARMS, a self-supervised reward shaping framework for multi-agent reinforcement learning (MARL) that addresses sparse rewards by learning dense shaping signals from trajectory ranking. The method reformulates policy invariance through conditional best-response reasoning, proving that under certain conditions, shaping rewards preserve each agent's best-response set and the set of Nash equilibria. This preserves the strategic structure of the problem, unlike standard reward shaping that may only improve short-term optimization. The work is presented in arXiv paper 2605.23562.

Key facts

  • ARMS stands for Automatic Reward-shaping in Multi-agent Systems.
  • It is a self-supervised framework for MARL.
  • It learns dense shaping signals from sparse environmental rewards.
  • Trajectory ranking is used to generate shaping signals.
  • Single-agent guarantees do not directly transfer to MARL.
  • The framework uses conditional best-response reasoning.
  • Shaping rewards preserve each agent's best-response set under fixed opponent policies.
  • The set of Nash equilibria is preserved under certain conditions.

Entities

Sources