ReplaySCM Benchmark Tests Causal Mechanism Induction from Interventions

ai-technology · 2026-05-12

ReplaySCM consists of 1,300 items designed to assess the induction of causal mechanisms from a limited set of interventional evidence. Each item is based on binary worlds created by a fully observed acyclic Boolean structural causal model (SCM). The systems are required to produce a mechanism map in a constrained Boolean DSL, which is then parsed, verified for legality and acyclicity, and tested on both training and held-out intervention worlds. Scoring is based on replay behavior, allowing different syntactic mechanisms to be recognized if they behave correctly. The benchmark explores various structural information disclosures, including Ordered, Block-order, Hidden-order, and Hidden-roots settings, and features Alternative-SCM tasks that provide a valid reference SCM while requesting a semantically distinct alternative that aligns with the training worlds.

Key facts

ReplaySCM contains 1,300 items.
Each item uses binary worlds from a latent acyclic Boolean SCM.
Output must be a mechanism map in a restricted Boolean DSL.
Submission is parsed, checked for legality and acyclicity, and replayed.
Scoring is based on replay behavior, not formula strings.
Settings include Ordered, Block-order, Hidden-order, and Hidden-roots.
Alternative-SCM tasks ask for a semantically distinct alternative SCM.
The benchmark is introduced on arXiv (2605.08197).

ReplaySCM Benchmark Tests Causal Mechanism Induction from Interventions

Key facts

Entities

Institutions

Sources