Norm-Anchor Scaling Prevents Model Edit Collapse
A failure mode was discovered in the sequential locate-and-edit (L&E) model editing process, characterized as a positive norm-feedback loop that causes an amplification between solved value vectors and adjusted MLP weights, ultimately leading to a decline in edit quality and a loss of capabilities. To address this, researchers introduced Norm-Anchor Scaling (NAS), a stabilizing plug-in that rescales each solved value vector to align with the original model's reference norm. Implemented across various LLM backbones, datasets, and L&E editors, NAS increases the effective editing range by more than 4x and enhances long-term editing performance by an average of 72.2%, all achieved with a single line modification and minimal computational expense.
Key facts
- Sequential L&E model editing can fail abruptly after many edits.
- Failure is caused by a positive norm-feedback loop.
- The loop involves solved value vectors and edited MLP weights.
- Norm growth under standard L&E dynamics is approximately exponential.
- Existing regularizers or update clamps do not resolve the issue.
- NAS breaks the loop by rescaling value vectors to original-model reference norm.
- NAS extends editing horizon by more than 4x.
- NAS improves long-run editing performance by 72.2% on average.
Entities
—