AdaFRUGAL: Dynamic Memory Optimization for LLM Training
AdaFRUGAL introduces dynamic controls for memory-efficient training of large language models, automating hyperparameter tuning that previously required manual intervention. The method extends the FRUGAL framework by incorporating a linear decay for the subspace ratio (ρ) and a loss-aware schedule for update frequency (T). Experiments on English C4 and Vietnamese VietVault pre-training datasets, as well as GLUE fine-tuning, show AdaFRUGAL maintains competitive performance against AdamW and static FRUGAL while reducing GPU memory and training time. This offers a practical solution for resource-constrained environments.
Key facts
- AdaFRUGAL automates hyperparameter tuning for FRUGAL's subspace ratio (ρ) and update frequency (T).
- Uses linear decay for ρ and loss-aware schedule for T.
- Tested on English C4 and Vietnamese VietVault pre-training, and GLUE fine-tuning.
- Maintains competitive performance against AdamW and static FRUGAL.
- Reduces GPU memory and training time.
- Targets resource-constrained LLM training.
- Published on arXiv (2601.11568).
- Authors not specified in source.
Entities
Institutions
- arXiv