PivotMerge: New Method for Multimodal AI Pre-training

other · 2026-04-29

arXiv paper 2604.22823v1 introduces PivotMerge, a novel approach for merging multimodal large language models (MLLMs) during pre-training. The method addresses the challenge of integrating cross-modal alignment capabilities learned from heterogeneous datasets, which often induce complementary strengths. Existing model merging research focuses on post-finetuning, leaving pre-training unexplored. PivotMerge targets post-alignment merging, aiming to combine visual and textual representations into a unified semantic space. Key challenges include cross-domain parameter interference and integration of diverse alignment knowledge. The paper proposes a solution to bridge heterogeneous multimodal pre-training via model merging.

Key facts

arXiv paper 2604.22823v1
Title: PivotMerge: Bridging Heterogeneous Multimodal Pre-training via Post-Alignment Model Merging
Focuses on multimodal large language models (MLLMs)
Addresses post-alignment merging task
Integrates cross-modal alignment from heterogeneous pre-training
Challenges: cross-domain parameter interference
Contrasts with existing work on post-finetuning merging
Aims to unify visual and textual representations

PivotMerge: New Method for Multimodal AI Pre-training

Key facts

Entities

Institutions

Sources