Gradient-Based Method Optimizes Pretraining Loss Weights Online

other · 2026-05-11

A new gradient-based bilevel method learns pretraining loss weights online by aligning composite gradients with downstream objectives, avoiding multiple backward passes. The approach reduces hyperparameter tuning overhead to ~30% above a single training run. Evaluated on event-sequence modeling and self-supervised computer vision, it matches or improves upon tuned baselines.

Key facts

Proposes gradient-based bilevel method for online loss weight learning
Aligns composite pretraining gradient with downstream objective
Avoids multiple backward passes via loss structure exploitation
Reduces hyperparameter tuning overhead to ~30% above single run
Evaluated on event-sequence modeling and self-supervised computer vision
Matches or improves upon carefully tuned baselines

Entities

—

Sources

arXiv cs.AI — 2026-05-11