Gradient-Based Method Optimizes Pretraining Loss Weights Online
A new gradient-based bilevel method learns pretraining loss weights online by aligning composite gradients with downstream objectives, avoiding multiple backward passes. The approach reduces hyperparameter tuning overhead to ~30% above a single training run. Evaluated on event-sequence modeling and self-supervised computer vision, it matches or improves upon tuned baselines.
Key facts
- Proposes gradient-based bilevel method for online loss weight learning
- Aligns composite pretraining gradient with downstream objective
- Avoids multiple backward passes via loss structure exploitation
- Reduces hyperparameter tuning overhead to ~30% above single run
- Evaluated on event-sequence modeling and self-supervised computer vision
- Matches or improves upon carefully tuned baselines
Entities
—