Convergent-Divergent Routing: Steering LLM Moral Reasoning
Researchers propose Convergent-Divergent Routing (CDR) to control moral reasoning in large language models at inference time. The method identifies and edits branch points within transformer blocks where ethical-framework-related pathways converge and diverge, blocking non-target branches to increase targeted reasoning. To achieve fine-grained control, they adapt Common Spatial Patterns to the residual stream, extracting discriminative directions between utilitarian and deontological frameworks. Dual Logit Calibration then applies a minimum-ℓ2-norm update to move residuals within this subspace. The approach preserves general competence while steering toward desired ethical frameworks.
Key facts
- Convergent-Divergent Routing (CDR) is introduced for inference-time steering of moral reasoning in LLMs.
- CDR traces and edits minimal branch points inside transformer blocks.
- Gating non-target branches blocks downstream propagation while leaving upstream computations intact.
- Common Spatial Patterns are adapted to the residual stream to extract discriminative directions.
- Dual Logit Calibration is a closed-form, minimum-ℓ2-norm update.
- The method targets utilitarian and deontological ethical frameworks.
- The research is published on arXiv with ID 2605.03609.
- The approach aims to preserve general competence while steering ethical preferences.
Entities
Institutions
- arXiv