ARTFEED — Contemporary Art Intelligence

Residual Paving: Diagnosing Routing Bottleneck in Selective Refusal Editing

other · 2026-05-22

A novel technique known as Residual Paving tackles the issue of selective refusal editing in instruction-tuned transformers by framing it as a three-way control challenge. This approach distinguishes between route selectivity (the decision to intervene) and residual-edit capacity (the specific edit to implement) through a routed residual editing framework. An early-layer router generates a scalar gate and expert mixture; when engaged, prompt-conditioned bottleneck residual experts execute updates from later layers without modifying the backbone. This separation allows for oracle-routing diagnostics, where the learned scalar gate is substituted with a held-out edit/keep label. On Gemma-3-4B-IT, Residual Paving decreases edit refusal from 88.6% to 4.0%, maintaining 95.5% benign behavior. The research can be found on arXiv under ID 2605.20262.

Key facts

  • Residual Paving is a routed residual editing method for frozen instruction-tuned transformers.
  • It separates route selectivity from residual-edit capacity.
  • An early-layer router predicts a scalar gate and expert mixture.
  • Prompt-conditioned bottleneck residual experts apply later-layer residual updates.
  • The backbone remains unchanged during edits.
  • Oracle-routing diagnostics replace the learned scalar gate with the held-out label.
  • On Gemma-3-4B-IT, edit refusal dropped from 88.6% to 4.0%.
  • Benign distribution preservation was 95.5%.
  • The paper is published on arXiv with ID 2605.20262.

Entities

Institutions

  • arXiv

Sources