Spectral Probe-Circuits: Identifying Attention-Head Circuits in Transformers
A novel technique named Spectral Probe-Circuits enables the identification of attention-head circuits in pretrained transformers without the need for labels or attribution gradients. This approach consists of three steps: it employs a per-head spectral signal, specifically the time-integrated participation ratio, to evaluate heads engaged in sustained content-dependent computation. A task-pattern filter narrows this down to a candidate circuit specific to the task, while group ablation against a matched-random control establishes causality. Tested across an 8x parameter range (from 51M to 1B-active / 7B-total), two architecture families (dense and mixture-of-experts), and four pretraining methods, the technique successfully identifies a 2-6 head induction circuit that is causally essential in all models examined, showing a 94-100% decrease in synthetic-induction top-1 following ablation. The spectral signal also demonstrates predictive capabilities without supervision; in six independent trials of a 51M-parameter probe model, it consistently identifies the seed-specific circuit.
Key facts
- Method identifies attention-head circuits in pretrained transformers
- Uses per-head spectral signal without labels or attribution gradients
- Three-step recipe: spectral ranking, task-pattern screen, group ablation
- Validated across 51M to 1B-active / 7B-total parameters
- Tested on dense and mixture-of-experts architectures
- Four pretraining pipelines used for validation
- 2-6 head induction circuit causally necessary in all models
- 94-100% drop in synthetic-induction top-1 after ablation
- Spectral signal predictive without supervision on six seeds
Entities
—