BAOC Reduces GPU Memory by Assigning Per-Block Optimizers
The Budget-Aware Optimizer Configurator (BAOC) optimizes GPU memory usage during large-scale model training by assigning appropriate optimizer settings to specific network blocks within set budgets. Different blocks display unique gradient behaviors, such as varying stability and anisotropy in scale, which renders a universal optimizer inefficient in terms of memory. BAOC analyzes gradient streams to generate statistical metrics that assess the performance risks associated with less expensive configurations, like low precision or momentum removal. It then addresses a constrained allocation challenge to minimize overall risk while adhering to memory and time constraints, ultimately selecting a budget-compliant configuration for each block. Experiments in vision, language, and diffusion tasks validate BAOC's effectiveness.
Key facts
- BAOC reduces GPU memory by assigning per-block optimizer configurations
- Gradients in different network blocks exhibit distinct behaviors
- BAOC samples gradient streams to derive risk metrics
- It solves a constrained allocation problem under memory and time budgets
- Experiments cover vision, language, and diffusion workloads
Entities
—