StableGrad: Optimizer-Level Scale Control for Deep Neural Networks

other · 2026-05-20

A new method called StableGrad addresses the challenge of controlling activation and gradient magnitudes in very deep neural networks without relying on batch normalization or other normalization layers. Traditional approaches like Batch Normalization and residual connections can introduce non-local dependencies, which is problematic for Physics-Informed Neural Networks (PINNs) where the network represents continuous physical fields and input derivatives define the training objective. StableGrad operates at the optimizer level, correcting layer-wise weight-gradient imbalances without modifying the forward model. This allows for stable training of deep networks in contexts where batch-dependent normalization is inappropriate. The method is detailed in a paper on arXiv (2605.19856).

Key facts

StableGrad controls magnitude propagation in deep neural networks.
It does not use Batch Normalization or other normalization layers.
Batch Normalization can introduce non-local dependencies in PINNs.
PINNs represent continuous physical fields with input derivatives as training objectives.
StableGrad corrects layer-wise weight-gradient imbalances.
It operates at the optimizer level without modifying the forward model.
The method is described in arXiv paper 2605.19856.
StableGrad enables stable training where batch-dependent normalization fails.

StableGrad: Optimizer-Level Scale Control for Deep Neural Networks

Key facts

Entities

Institutions

Sources