LARS: New Fine-Tuning Method Reduces LLM Memory by 33%

ai-technology · 2026-04-29

A new study from arXiv challenges the assumption that parameter-efficient fine-tuning (PEFT) methods like LoRA and IA3 are memory-efficient for on-device LLM adaptation. The authors introduce LARS (Low-memory Activation-Rank Subspace), which constrains the activation subspace during training rather than model parameters, decoupling memory consumption from sequence length. LARS reduces memory footprint by an average of 33.54% compared to prior PEFT methods, addressing out-of-memory errors on devices.

Key facts

Parameter-efficient fine-tuning (PEFT) methods like LoRA and IA3 reduce trainable parameters but are not memory-efficient.
Intermediate tensors in PEFT scale linearly with sequence length, causing out-of-memory errors on devices.
LARS (Low-memory Activation-Rank Subspace) constrains the activation subspace during training.
LARS decouples memory consumption from sequence length.
LARS reduces memory footprint by an average of 33.54%.
The study is published on arXiv with ID 2604.22783.
The work targets on-device LLM adaptation.
Prior PEFT methods apply low-rank constraints to model parameters.

LARS: New Fine-Tuning Method Reduces LLM Memory by 33%

Key facts

Entities

Institutions

Sources