ReplaySCM Benchmark Tests Causal Mechanism Induction from Interventions
ReplaySCM consists of 1,300 items designed to assess the induction of causal mechanisms from a limited set of interventional evidence. Each item is based on binary worlds created by a fully observed acyclic Boolean structural causal model (SCM). The systems are required to produce a mechanism map in a constrained Boolean DSL, which is then parsed, verified for legality and acyclicity, and tested on both training and held-out intervention worlds. Scoring is based on replay behavior, allowing different syntactic mechanisms to be recognized if they behave correctly. The benchmark explores various structural information disclosures, including Ordered, Block-order, Hidden-order, and Hidden-roots settings, and features Alternative-SCM tasks that provide a valid reference SCM while requesting a semantically distinct alternative that aligns with the training worlds.
Key facts
- ReplaySCM contains 1,300 items.
- Each item uses binary worlds from a latent acyclic Boolean SCM.
- Output must be a mechanism map in a restricted Boolean DSL.
- Submission is parsed, checked for legality and acyclicity, and replayed.
- Scoring is based on replay behavior, not formula strings.
- Settings include Ordered, Block-order, Hidden-order, and Hidden-roots.
- Alternative-SCM tasks ask for a semantically distinct alternative SCM.
- The benchmark is introduced on arXiv (2605.08197).
Entities
Institutions
- arXiv