ARTFEED — Contemporary Art Intelligence

PivotMerge: New Method for Multimodal AI Pre-training

other · 2026-04-29

arXiv paper 2604.22823v1 introduces PivotMerge, a novel approach for merging multimodal large language models (MLLMs) during pre-training. The method addresses the challenge of integrating cross-modal alignment capabilities learned from heterogeneous datasets, which often induce complementary strengths. Existing model merging research focuses on post-finetuning, leaving pre-training unexplored. PivotMerge targets post-alignment merging, aiming to combine visual and textual representations into a unified semantic space. Key challenges include cross-domain parameter interference and integration of diverse alignment knowledge. The paper proposes a solution to bridge heterogeneous multimodal pre-training via model merging.

Key facts

  • arXiv paper 2604.22823v1
  • Title: PivotMerge: Bridging Heterogeneous Multimodal Pre-training via Post-Alignment Model Merging
  • Focuses on multimodal large language models (MLLMs)
  • Addresses post-alignment merging task
  • Integrates cross-modal alignment from heterogeneous pre-training
  • Challenges: cross-domain parameter interference
  • Contrasts with existing work on post-finetuning merging
  • Aims to unify visual and textual representations

Entities

Institutions

  • arXiv

Sources