LARS: New Fine-Tuning Method Reduces LLM Memory by 33%
A new study from arXiv challenges the assumption that parameter-efficient fine-tuning (PEFT) methods like LoRA and IA3 are memory-efficient for on-device LLM adaptation. The authors introduce LARS (Low-memory Activation-Rank Subspace), which constrains the activation subspace during training rather than model parameters, decoupling memory consumption from sequence length. LARS reduces memory footprint by an average of 33.54% compared to prior PEFT methods, addressing out-of-memory errors on devices.
Key facts
- Parameter-efficient fine-tuning (PEFT) methods like LoRA and IA3 reduce trainable parameters but are not memory-efficient.
- Intermediate tensors in PEFT scale linearly with sequence length, causing out-of-memory errors on devices.
- LARS (Low-memory Activation-Rank Subspace) constrains the activation subspace during training.
- LARS decouples memory consumption from sequence length.
- LARS reduces memory footprint by an average of 33.54%.
- The study is published on arXiv with ID 2604.22783.
- The work targets on-device LLM adaptation.
- Prior PEFT methods apply low-rank constraints to model parameters.
Entities
Institutions
- arXiv