DualOpt: Decoupled Optimization for Scratch and Fine-Tuning

ai-technology · 2026-04-29

Researchers propose DualOpt, a novel optimization approach that decouples techniques for training neural networks from scratch versus fine-tuning pre-trained models. For scratch training, real-time layer-wise weight decay is introduced to improve convergence and generalization. For fine-tuning, weight rollback is integrated into the optimizer to prevent catastrophic forgetting. The method addresses the distinct demands of these two paradigms, which existing optimizers fail to fully accommodate. The paper is available on arXiv under reference 2604.22838.

Key facts

DualOpt decouples optimization for scratch and fine-tuning.
Real-time layer-wise weight decay enhances scratch training.
Weight rollback prevents catastrophic forgetting in fine-tuning.
Existing optimizers do not fully address distinct training paradigms.
Paper available on arXiv: 2604.22838.

DualOpt: Decoupled Optimization for Scratch and Fine-Tuning

Key facts

Entities

Institutions

Sources