MTG-Causal-RL: A Benchmark for Causal Reinforcement Learning in Complex Card Games
A new benchmark for causal reinforcement learning, named MTG-Causal-RL, has been developed by researchers using Magic: The Gathering. This environment includes a partial observation space of 3,077 dimensions and a masked discrete action space with 478 possible actions. It features five competitive Standard archetypes and three different reward schemes, along with a hand-crafted Structural Causal Model (SCM) focused on strategic variables. Each episode reveals causal variables, predicted intervention effects from the SCM, and credit traces for each factor, facilitating causal credit assignment, cross-archetype transfer, and policy auditability. The authors have modified various baselines, including random, heuristic, and masked PPO, and introduced Causal Graph-Factored Advantage PPO (CGFA-PPO) as a reference causal algorithm. This research fills a gap in benchmarks for complex systems that involve sequential decision-making, hidden information, extensive masked action spaces, and clear causal structures.
Key facts
- MTG-Causal-RL is a Gymnasium benchmark built on Magic: The Gathering.
- Observation space is 3,077-dimensional.
- Action space is 478-action masked discrete.
- Includes five competitive Standard archetypes.
- Three reward schemes are provided.
- Hand-specified Structural Causal Model (SCM) over strategic variables.
- Exposes causal variables, SCM-predicted intervention effects, and per-factor credit traces.
- Proposes Causal Graph-Factored Advantage PPO (CGFA-PPO) as a reference algorithm.
Entities
Institutions
- arXiv