Polar Express: A New GPU-Friendly Algorithm for Polar Decomposition in Deep Learning

ai-technology · 2026-05-07

A novel technique named Polar Express has been developed for calculating the polar decomposition and matrix sign function, specifically tailored for GPU-accelerated deep learning training. In contrast to traditional numerical methods that emphasize precision, Polar Express prioritizes high throughput by utilizing solely matrix-matrix multiplications, enhancing efficiency on GPUs. The algorithm modifies its update rule at each iteration by resolving a minimax optimization challenge, drawing inspiration from previous research by Chen & Chow and Nakatsukasa & Freund. This approach effectively minimizes error in a worst-case scenario, facilitating quick convergence. Polar Express serves as a crucial subroutine within the Muon optimizer for training deep neural networks, catering to the unique demands of deep learning tasks.

Key facts

Polar Express is a new method for computing the polar decomposition and matrix sign function.
It is designed for GPU-friendly deep learning, prioritizing high throughput over high precision.
The algorithm uses only matrix-matrix multiplications, similar to Newton-Schulz and other polynomial methods.
It adapts the update rule at each iteration by solving a minimax optimization problem.
The method is inspired by earlier work of Chen & Chow and Nakatsukasa & Freund.
Polar Express is proven to minimize error in a worst-case sense.
It converges rapidly and is used within the Muon optimizer for training deep neural networks.
The approach addresses the distinct requirements of deep learning compared to classical settings.

Entities

—

Sources

arXiv cs.AI — 2026-05-06