ARTFEED — Contemporary Art Intelligence

ReplaySCM Benchmark Tests Causal Mechanism Induction from Interventions

ai-technology · 2026-05-12

ReplaySCM consists of 1,300 items designed to assess the induction of causal mechanisms from a limited set of interventional evidence. Each item is based on binary worlds created by a fully observed acyclic Boolean structural causal model (SCM). The systems are required to produce a mechanism map in a constrained Boolean DSL, which is then parsed, verified for legality and acyclicity, and tested on both training and held-out intervention worlds. Scoring is based on replay behavior, allowing different syntactic mechanisms to be recognized if they behave correctly. The benchmark explores various structural information disclosures, including Ordered, Block-order, Hidden-order, and Hidden-roots settings, and features Alternative-SCM tasks that provide a valid reference SCM while requesting a semantically distinct alternative that aligns with the training worlds.

Key facts

  • ReplaySCM contains 1,300 items.
  • Each item uses binary worlds from a latent acyclic Boolean SCM.
  • Output must be a mechanism map in a restricted Boolean DSL.
  • Submission is parsed, checked for legality and acyclicity, and replayed.
  • Scoring is based on replay behavior, not formula strings.
  • Settings include Ordered, Block-order, Hidden-order, and Hidden-roots.
  • Alternative-SCM tasks ask for a semantically distinct alternative SCM.
  • The benchmark is introduced on arXiv (2605.08197).

Entities

Institutions

  • arXiv

Sources