Polar Express: A New GPU-Friendly Algorithm for Polar Decomposition in Deep Learning
A novel technique named Polar Express has been developed for calculating the polar decomposition and matrix sign function, specifically tailored for GPU-accelerated deep learning training. In contrast to traditional numerical methods that emphasize precision, Polar Express prioritizes high throughput by utilizing solely matrix-matrix multiplications, enhancing efficiency on GPUs. The algorithm modifies its update rule at each iteration by resolving a minimax optimization challenge, drawing inspiration from previous research by Chen & Chow and Nakatsukasa & Freund. This approach effectively minimizes error in a worst-case scenario, facilitating quick convergence. Polar Express serves as a crucial subroutine within the Muon optimizer for training deep neural networks, catering to the unique demands of deep learning tasks.
Key facts
- Polar Express is a new method for computing the polar decomposition and matrix sign function.
- It is designed for GPU-friendly deep learning, prioritizing high throughput over high precision.
- The algorithm uses only matrix-matrix multiplications, similar to Newton-Schulz and other polynomial methods.
- It adapts the update rule at each iteration by solving a minimax optimization problem.
- The method is inspired by earlier work of Chen & Chow and Nakatsukasa & Freund.
- Polar Express is proven to minimize error in a worst-case sense.
- It converges rapidly and is used within the Muon optimizer for training deep neural networks.
- The approach addresses the distinct requirements of deep learning compared to classical settings.
Entities
—