New Framework Corrects Biases in Preconditioned Language Model Optimizers

other · 2026-05-22

A recent publication on arXiv (2605.20756) presents a new framework aimed at rectifying two finite-sample biases found in preconditioned optimizers utilized for training language models. The researchers pinpoint two specific biases: the gradient–preconditioner coupling bias, which occurs when both are estimated from the same minibatch, and inversion bias, which results from the nonlinear inversion of unbiased preconditioner estimates. Their method for bias correction within a single batch employs cross-fitted preconditioning (where numerator and preconditioner are derived from separate microbatch groups) and variance-corrected inversion (which adjusts for leading delta-method bias using microbatch variability). This framework is applicable to various preconditioning techniques, including AdamW, Sophia, and others, filling a notable gap in stochastic optimization theory for large-scale language models.

Key facts

Paper is on arXiv with ID 2605.20756
Identifies gradient–preconditioner coupling bias
Identifies inversion bias from nonlinear inversion
Proposes cross-fitted preconditioning
Proposes variance-corrected inversion
Applies to AdamW, Sophia, and other optimizers
Addresses finite-sample biases in stochastic optimization
Focuses on language model training

New Framework Corrects Biases in Preconditioned Language Model Optimizers

Key facts

Entities

Institutions

Sources