ARTFEED — Contemporary Art Intelligence

PSR Models Outperform Existing Activation Steering Methods

ai-technology · 2026-05-07

A new framework, Prompt Steering Replacement (PSR), formulates prompt steering as activation steering and trains models to imitate prompt-based interventions. PSR estimates token-specific steering coefficients from activations, outperforming existing activation steering methods on three benchmarks across multiple language models.

Key facts

  • arXiv:2605.03907v1
  • PSR models estimate token-specific steering coefficients from activations
  • PSR models are trained to imitate prompt-based interventions
  • Experiments on three steering benchmarks
  • PSR models outperform existing activation steering methods
  • Popular activation steering methods are not faithful to prompt steering mechanics
  • Prompt steering applies strong interventions on some tokens while barely affecting others
  • Framework formulates prompt steering as a form of activation steering

Entities

Sources