PivotMerge: New Method for Multimodal AI Pre-training
arXiv paper 2604.22823v1 introduces PivotMerge, a novel approach for merging multimodal large language models (MLLMs) during pre-training. The method addresses the challenge of integrating cross-modal alignment capabilities learned from heterogeneous datasets, which often induce complementary strengths. Existing model merging research focuses on post-finetuning, leaving pre-training unexplored. PivotMerge targets post-alignment merging, aiming to combine visual and textual representations into a unified semantic space. Key challenges include cross-domain parameter interference and integration of diverse alignment knowledge. The paper proposes a solution to bridge heterogeneous multimodal pre-training via model merging.
Key facts
- arXiv paper 2604.22823v1
- Title: PivotMerge: Bridging Heterogeneous Multimodal Pre-training via Post-Alignment Model Merging
- Focuses on multimodal large language models (MLLMs)
- Addresses post-alignment merging task
- Integrates cross-modal alignment from heterogeneous pre-training
- Challenges: cross-domain parameter interference
- Contrasts with existing work on post-finetuning merging
- Aims to unify visual and textual representations
Entities
Institutions
- arXiv