ARTFEED — Contemporary Art Intelligence

Mechanistic Interpretability Papers Lack Causal Identification Assumptions

other · 2026-05-11

A recent study published on arXiv (2605.08012) highlights that research in AI focused on mechanistic interpretability increasingly employs causal terminology—such as circuits, mediators, causal abstraction, and monosemanticity—yet neglects to reveal the identification assumptions necessary for substantiating causal assertions. The authors performed a targeted review of 10 papers spanning four methodological approaches and discovered an absence of sections dedicated to identification assumptions. Instead, they noted that validation metrics like faithfulness, completeness, monosemanticity, alignment, or ablation effects are presented as causal evidence without clarifying the underlying assumptions. A secondary audit by two human coders on n=30 confirmed the primary finding: the lack of identification sections and the frequent replacement of validation metrics. The authors suggest a norm for disclosure: specify if a claim is causal, identify the strategy, list assumptions, emphasize at least one, and clarify how conclusions might change if those assumptions are not met.

Key facts

  • Paper on arXiv with ID 2605.08012
  • Audit of 10 papers across four methodological strands
  • No dedicated identification-assumptions section found
  • Validation metrics used as causal support without assumptions
  • Two-human-coder audit on n=30 reproduced findings
  • Proposes disclosure norm for causal claims
  • Causal vocabulary includes circuits, mediators, causal abstraction, monosemanticity
  • Metrics include faithfulness, completeness, monosemanticity, alignment, ablation effects

Entities

Institutions

  • arXiv

Sources