CP-MoE Framework Addresses Catastrophic Forgetting in LLMs and VLMs
A team of researchers has introduced CP-MoE, a continual learning framework designed for large language models (LLMs) and vision-language models (VLMs) to address the issue of catastrophic forgetting. Current Mixture-of-Experts (MoE) techniques based on LoRA either overly isolate experts, hindering knowledge transfer, or permit task-specific updates to overwrite crucial parameters. CP-MoE features a transient expert that captures initial task-specific updates, aiding in their integration into stable experts. Additionally, it employs a consistency-preserving routing bias to assess representation similarity, alongside a transient expert-guided regularization method. This innovative approach seeks to strike a balance between facilitating knowledge transfer and preventing forgetting.
Key facts
- CP-MoE is a continual learning framework for LLMs and VLMs.
- It addresses catastrophic forgetting in large language models.
- Existing LoRA-based MoE methods face a trade-off between knowledge transfer and forgetting.
- CP-MoE uses a transient expert to capture early task-specific updates.
- It introduces a consistency-preserving routing bias.
- The routing bias estimates representation similarity with stable experts.
- A transient expert-guided regularization mechanism is included.
- The framework aims to improve expert selection and reduce forgetting.
Entities
—