Inductive Logic for Mechanistic Interpretability of Neural Networks

ai-technology · 2026-05-22

A new study published on arXiv presents a structured approach designed to improve our understanding of mechanistic science in neural network interpretability. The research treats circuit interpretation as a way to develop inductive theories, examining each circuit at two levels: a Causal Functional Signature (CFS) that connects the behavior of components to causal evidence, and an architectural signature based on inductive logic programming (ILP) using scale-invariant structural predicates. Together, these components form a coherence layer that makes mechanistic claims clearer, enabling comparisons through θ-subsumption and adaptability across different model sizes. The aim is to transform individual circuit discoveries into a cohesive formal representation, aiding in the comparison and accumulation of mechanistic knowledge.

Key facts

Paper published on arXiv with ID 2605.21303
Announce type is cross
Proposes Causal Functional Signature (CFS) for circuit characterization
Uses inductive logic programming (ILP) for architectural signature
Architectural signature is learned from scale-invariant structural predicates
Claims are made comparable via θ-subsumption
Aims to enable portability across model scales
Treats circuit interpretation as inductive theory construction

Inductive Logic for Mechanistic Interpretability of Neural Networks

Key facts

Entities

Institutions

Sources