HELLoRA: Efficient Fine-Tuning for Mixture-of-Experts Models
A novel technique known as HELLoRA (Hot-Experts Layer-level Low-Rank Adaptation) enhances the efficiency of fine-tuning parameters for Mixture-of-Experts (MoE) models. Unlike conventional LoRA, which focuses on dense architectures, HELLoRA integrates LoRA modules exclusively with the most activated experts within each layer. This approach minimizes both trainable parameters and adapter-induced FLOPs, while simultaneously enhancing downstream performance due to structured regularization that maintains the specialization of pretrained experts. When combined with LoRI to create HELLoRI, which freezes up-projection and sparsifies down-projection, the method was evaluated on three MoE backbones: OlMoE-1B-7B, Mixtral-8x7B, and Deep. The research paper can be found on arXiv under ID 2605.18795.
Key facts
- HELLoRA attaches LoRA modules only to the most frequently activated experts at each layer.
- It reduces trainable parameters and adapter-induced FLOPs while improving downstream performance.
- The effect is attributed to structured regularization that preserves pretrained expert specialization.
- HELLoRI combines HELLoRA with LoRI, freezing up-projection and sparsifying down-projection.
- Tested on three MoE backbones: OlMoE-1B-7B, Mixtral-8x7B, and Deep.
- The paper is on arXiv with ID 2605.18795.
- LoRA dominates parameter-efficient fine-tuning of large language models.
- MoE models scale parameters at near-constant per-token compute.
Entities
Institutions
- arXiv