ARTFEED — Contemporary Art Intelligence

LARS: New Fine-Tuning Method Reduces LLM Memory by 33%

ai-technology · 2026-04-29

A new study from arXiv challenges the assumption that parameter-efficient fine-tuning (PEFT) methods like LoRA and IA3 are memory-efficient for on-device LLM adaptation. The authors introduce LARS (Low-memory Activation-Rank Subspace), which constrains the activation subspace during training rather than model parameters, decoupling memory consumption from sequence length. LARS reduces memory footprint by an average of 33.54% compared to prior PEFT methods, addressing out-of-memory errors on devices.

Key facts

  • Parameter-efficient fine-tuning (PEFT) methods like LoRA and IA3 reduce trainable parameters but are not memory-efficient.
  • Intermediate tensors in PEFT scale linearly with sequence length, causing out-of-memory errors on devices.
  • LARS (Low-memory Activation-Rank Subspace) constrains the activation subspace during training.
  • LARS decouples memory consumption from sequence length.
  • LARS reduces memory footprint by an average of 33.54%.
  • The study is published on arXiv with ID 2604.22783.
  • The work targets on-device LLM adaptation.
  • Prior PEFT methods apply low-rank constraints to model parameters.

Entities

Institutions

  • arXiv

Sources