Geometry-Guided Search Reduces Cost of Rank-1 LLM Steering
A recent preprint on arXiv suggests that the inconsistent effectiveness of activation steering in large language models (LLMs) is primarily attributed to search challenges rather than a lack of a unified steering direction. The researchers define rank-1 steering as an optimization process constrained by budget, focusing on intervention layers and coefficients. Their findings indicate that alignment with prompt-boundary directions can predict effective interventions, leading to a geometry-guided search that decreases the evaluations required to achieve 95% of the optimal utility by an average of 39.8% across three model families. Additionally, the paper introduces the concept of "concept granularity" to clarify why certain concepts remain costly despite improved search techniques.
Key facts
- Activation steering offers a lightweight way to control LLMs without retraining.
- Effectiveness of steering varies sharply across concepts.
- Prior work often attributes variability to concepts not being captured by a single direction.
- Authors argue variability reflects search difficulty.
- Rank-1 steering formalized as budget-constrained optimization over layer and coefficient.
- Prompt-boundary directional alignment predicts effective interventions.
- Geometry-guided search reduces trials to recover 95% utility by 39.8% on average.
- Concept granularity introduced to explain expensive concepts.
Entities
Institutions
- arXiv