ARTFEED — Contemporary Art Intelligence

HDMI: Probe-Free Causal Probing for LLMs

ai-technology · 2026-05-11

A new method called Hidden-state Driven Margin Intervention (HDMI) allows causal probing of large language models without training auxiliary classifiers. HDMI uses gradient-based steering to modify hidden states directly via the model's native output, applying a margin objective to increase target continuation probability while decreasing source probability. A lookahead variant (LA-HDMI) enables text editing by backpropagating through softmax embeddings. The approach avoids misalignment issues common in probe-based methods. The paper is available on arXiv under ID 2605.07631.

Key facts

  • HDMI is a probe-free, gradient-based causal probing technique.
  • It uses a margin objective to steer hidden states.
  • LA-HDMI variant allows text editing via softmax backpropagation.
  • The method avoids auxiliary probe classifiers.
  • Paper available at arXiv:2605.07631.
  • Causal probing tests how internal representations influence model behavior.
  • Existing methods rely on trained probe classifiers.
  • HDMI directly uses the model's native output.

Entities

Institutions

  • arXiv

Sources