Convergent-Divergent Routing: Steering LLM Moral Reasoning

ai-technology · 2026-05-07

Researchers propose Convergent-Divergent Routing (CDR) to control moral reasoning in large language models at inference time. The method identifies and edits branch points within transformer blocks where ethical-framework-related pathways converge and diverge, blocking non-target branches to increase targeted reasoning. To achieve fine-grained control, they adapt Common Spatial Patterns to the residual stream, extracting discriminative directions between utilitarian and deontological frameworks. Dual Logit Calibration then applies a minimum-ℓ2-norm update to move residuals within this subspace. The approach preserves general competence while steering toward desired ethical frameworks.

Key facts

Convergent-Divergent Routing (CDR) is introduced for inference-time steering of moral reasoning in LLMs.
CDR traces and edits minimal branch points inside transformer blocks.
Gating non-target branches blocks downstream propagation while leaving upstream computations intact.
Common Spatial Patterns are adapted to the residual stream to extract discriminative directions.
Dual Logit Calibration is a closed-form, minimum-ℓ2-norm update.
The method targets utilitarian and deontological ethical frameworks.
The research is published on arXiv with ID 2605.03609.
The approach aims to preserve general competence while steering ethical preferences.

Convergent-Divergent Routing: Steering LLM Moral Reasoning

Key facts

Entities

Institutions

Sources