Research Demonstrates Local Linearity in LLMs Enables Activation Steering via Linear Optimal Control

ai-technology · 2026-04-22

A recent study reveals that large language models (LLMs) display local linearity in their dynamics across layers, facilitating improved activation steering during inference. The paper, available on arXiv with the identifier 2604.19018v1, illustrates that even with the nonlinear nature of transformer blocks, the dynamics of various LLM architectures can be approximated using locally-linear models. This characteristic allows for modeling LLM inference as a linear time-varying dynamical system, enabling the adaptation of traditional linear quadratic regulator techniques for feedback controller computation. By employing layer-wise Jacobians, the method guides activations toward specific semantic targets with minimal computational cost and no need for offline training. Unlike existing methods that often overlook perturbation propagation and lack real-time error feedback, this research provides both theoretical bounds and empirical support for the local linearity concept, marking a notable improvement in inference-time alignment strategies for LLMs.

Key facts

Research paper published on arXiv under identifier 2604.19018v1
Demonstrates local linearity in layer-wise dynamics of large language models
Enables activation steering via linear optimal control methods
Models LLM inference as linear time-varying dynamical system
Uses layer-wise Jacobians to compute feedback controllers
Requires no offline training and minimal computational overhead
Addresses limitations of existing non-anticipative intervention methods
Provides theoretical bounds supporting local linearity observation

Research Demonstrates Local Linearity in LLMs Enables Activation Steering via Linear Optimal Control

Key facts

Entities

Institutions

Sources