ARTFEED — Contemporary Art Intelligence

New Framework Corrects Biases in Preconditioned Language Model Optimizers

other · 2026-05-22

A recent publication on arXiv (2605.20756) presents a new framework aimed at rectifying two finite-sample biases found in preconditioned optimizers utilized for training language models. The researchers pinpoint two specific biases: the gradient–preconditioner coupling bias, which occurs when both are estimated from the same minibatch, and inversion bias, which results from the nonlinear inversion of unbiased preconditioner estimates. Their method for bias correction within a single batch employs cross-fitted preconditioning (where numerator and preconditioner are derived from separate microbatch groups) and variance-corrected inversion (which adjusts for leading delta-method bias using microbatch variability). This framework is applicable to various preconditioning techniques, including AdamW, Sophia, and others, filling a notable gap in stochastic optimization theory for large-scale language models.

Key facts

  • Paper is on arXiv with ID 2605.20756
  • Identifies gradient–preconditioner coupling bias
  • Identifies inversion bias from nonlinear inversion
  • Proposes cross-fitted preconditioning
  • Proposes variance-corrected inversion
  • Applies to AdamW, Sophia, and other optimizers
  • Addresses finite-sample biases in stochastic optimization
  • Focuses on language model training

Entities

Institutions

  • arXiv

Sources