PLOT: New Framework for Neural Causal Abstraction via Optimal Transport

ai-technology · 2026-05-11

A new framework called PLOT (Progressive Localization via Optimal Transport) has been introduced for neural causal abstraction, a method for mechanistic interpretability. PLOT uses optimal transport to localize causal variables by analyzing the output effect geometry of abstract and neural interventions, fitting a coupling between abstract variables and candidate neural sites to produce a global soft correspondence. This approach addresses the computational burden of existing methods like distributed alignment search (DAS), which require searching over candidate sites. In simple settings, PLOT works with a single coupling over individual neurons; for larger models, it is applied progressively.

Key facts

PLOT stands for Progressive Localization via Optimal Transport.
It is a transport-based framework for neural causal abstraction.
It localizes causal variables from output effect geometry of abstract and neural interventions.
It fits an optimal transport coupling between abstract variables and candidate neural sites.
The coupling yields a global soft correspondence that can be calibrated into intervention handles.
In simple settings, a single coupling over individual neurons suffices.
In larger models, PLOT is applied progressively.
The framework addresses computational burdens of existing methods like DAS.

Entities

—

Sources

arXiv cs.AI — 2026-05-11