Spectral Probe-Circuits: Identifying Attention-Head Circuits in Transformers

ai-technology · 2026-05-26

A novel technique named Spectral Probe-Circuits enables the identification of attention-head circuits in pretrained transformers without the need for labels or attribution gradients. This approach consists of three steps: it employs a per-head spectral signal, specifically the time-integrated participation ratio, to evaluate heads engaged in sustained content-dependent computation. A task-pattern filter narrows this down to a candidate circuit specific to the task, while group ablation against a matched-random control establishes causality. Tested across an 8x parameter range (from 51M to 1B-active / 7B-total), two architecture families (dense and mixture-of-experts), and four pretraining methods, the technique successfully identifies a 2-6 head induction circuit that is causally essential in all models examined, showing a 94-100% decrease in synthetic-induction top-1 following ablation. The spectral signal also demonstrates predictive capabilities without supervision; in six independent trials of a 51M-parameter probe model, it consistently identifies the seed-specific circuit.

Key facts

Method identifies attention-head circuits in pretrained transformers
Uses per-head spectral signal without labels or attribution gradients
Three-step recipe: spectral ranking, task-pattern screen, group ablation
Validated across 51M to 1B-active / 7B-total parameters
Tested on dense and mixture-of-experts architectures
Four pretraining pipelines used for validation
2-6 head induction circuit causally necessary in all models
94-100% drop in synthetic-induction top-1 after ablation
Spectral signal predictive without supervision on six seeds

Entities

—

Sources

arXiv cs.AI — 2026-05-26