HDMI: Probe-Free Causal Probing for LLMs

ai-technology · 2026-05-11

A new method called Hidden-state Driven Margin Intervention (HDMI) allows causal probing of large language models without training auxiliary classifiers. HDMI uses gradient-based steering to modify hidden states directly via the model's native output, applying a margin objective to increase target continuation probability while decreasing source probability. A lookahead variant (LA-HDMI) enables text editing by backpropagating through softmax embeddings. The approach avoids misalignment issues common in probe-based methods. The paper is available on arXiv under ID 2605.07631.

Key facts

HDMI is a probe-free, gradient-based causal probing technique.
It uses a margin objective to steer hidden states.
LA-HDMI variant allows text editing via softmax backpropagation.
The method avoids auxiliary probe classifiers.
Paper available at arXiv:2605.07631.
Causal probing tests how internal representations influence model behavior.
Existing methods rely on trained probe classifiers.
HDMI directly uses the model's native output.

HDMI: Probe-Free Causal Probing for LLMs

Key facts

Entities

Institutions

Sources