ARTFEED — Contemporary Art Intelligence

Convergent-Divergent Routing: Steering LLM Moral Reasoning

ai-technology · 2026-05-07

Researchers propose Convergent-Divergent Routing (CDR) to control moral reasoning in large language models at inference time. The method identifies and edits branch points within transformer blocks where ethical-framework-related pathways converge and diverge, blocking non-target branches to increase targeted reasoning. To achieve fine-grained control, they adapt Common Spatial Patterns to the residual stream, extracting discriminative directions between utilitarian and deontological frameworks. Dual Logit Calibration then applies a minimum-ℓ2-norm update to move residuals within this subspace. The approach preserves general competence while steering toward desired ethical frameworks.

Key facts

  • Convergent-Divergent Routing (CDR) is introduced for inference-time steering of moral reasoning in LLMs.
  • CDR traces and edits minimal branch points inside transformer blocks.
  • Gating non-target branches blocks downstream propagation while leaving upstream computations intact.
  • Common Spatial Patterns are adapted to the residual stream to extract discriminative directions.
  • Dual Logit Calibration is a closed-form, minimum-ℓ2-norm update.
  • The method targets utilitarian and deontological ethical frameworks.
  • The research is published on arXiv with ID 2605.03609.
  • The approach aims to preserve general competence while steering ethical preferences.

Entities

Institutions

  • arXiv

Sources