PSR Models Outperform Existing Activation Steering Methods

ai-technology · 2026-05-07

A new framework, Prompt Steering Replacement (PSR), formulates prompt steering as activation steering and trains models to imitate prompt-based interventions. PSR estimates token-specific steering coefficients from activations, outperforming existing activation steering methods on three benchmarks across multiple language models.

Key facts

arXiv:2605.03907v1
PSR models estimate token-specific steering coefficients from activations
PSR models are trained to imitate prompt-based interventions
Experiments on three steering benchmarks
PSR models outperform existing activation steering methods
Popular activation steering methods are not faithful to prompt steering mechanics
Prompt steering applies strong interventions on some tokens while barely affecting others
Framework formulates prompt steering as a form of activation steering

Entities

—

Sources

arXiv cs.AI — 2026-05-06