Neural Tangent Kernel Reveals Interpretable Features in Neural Networks

ai-technology · 2026-05-07

A new study shows that by analyzing the empirical neural tangent kernel (eNTK), we can identify feature directions in trained neural networks. The research examines three cases: a one-layer MLP that handles modular addition, a one-layer Transformer also focused on modular addition, and the pretrained language model Gemma-3-270M. In tasks related to modular arithmetic, the primary eNTK eigenspaces align with the Fourier features the models use, including certain frequency patterns in the Transformer. This alignment shifts during training, peaking just before the grokking phase begins. Additionally, the top eNTK eigendirections for Gemma-3-270M were assessed using TinyStories context windows and compared to automatically generated features, supporting the notion that eNTK analysis uncovers interpretable features in neural networks.

Key facts

Eigenanalysis of the empirical neural tangent kernel (eNTK) can surface feature directions in trained neural networks.
The study includes a 1-layer MLP trained on modular addition, a 1-layer Transformer trained on modular addition, and the pretrained language model Gemma-3-270M.
Top eNTK eigenspaces align with ground-truth or interpretable features across all three settings.
In modular arithmetic, top eNTK eigenspaces align with Fourier features used by the MLP and Transformer.
The Transformer uses Fourier features at seed-dependent frequencies to implement known ground-truth algorithms.
Alignment of relevant subspaces evolves over training, with its first derivative peaking near the onset of grokking.
For Gemma-3-270M, top eNTK eigendirections were computed on a dataset of TinyStories context windows.
The alignment of eNTK eigendirections with automatically generated features was checked for Gemma-3-270M.

Entities

—

Sources

arXiv cs.AI — 2026-05-07