Inductive Logic for Mechanistic Interpretability of Neural Networks
A new study published on arXiv presents a structured approach designed to improve our understanding of mechanistic science in neural network interpretability. The research treats circuit interpretation as a way to develop inductive theories, examining each circuit at two levels: a Causal Functional Signature (CFS) that connects the behavior of components to causal evidence, and an architectural signature based on inductive logic programming (ILP) using scale-invariant structural predicates. Together, these components form a coherence layer that makes mechanistic claims clearer, enabling comparisons through θ-subsumption and adaptability across different model sizes. The aim is to transform individual circuit discoveries into a cohesive formal representation, aiding in the comparison and accumulation of mechanistic knowledge.
Key facts
- Paper published on arXiv with ID 2605.21303
- Announce type is cross
- Proposes Causal Functional Signature (CFS) for circuit characterization
- Uses inductive logic programming (ILP) for architectural signature
- Architectural signature is learned from scale-invariant structural predicates
- Claims are made comparable via θ-subsumption
- Aims to enable portability across model scales
- Treats circuit interpretation as inductive theory construction
Entities
Institutions
- arXiv