Model Merging: Combining Neural Networks in Weight Space

publication · 2026-05-06

A new thesis on arXiv (2605.01580) proposes model merging as an alternative to training separate neural networks. The approach combines independently trained networks directly in weight space without requiring original training data or extensive optimization. In the single-task setting, the thesis introduces C$^2$M$^3$, a cycle-consistent merging algorithm based on Frank-Wolfe optimization that aligns multiple networks into a shared parameter space. For multi-task settings, where models are fine-tuned from a common initialization, a theoretical framework is developed. The work challenges the conventional paradigm of treating models as isolated artifacts.

Key facts

Thesis on arXiv with ID 2605.01580
Proposes model merging as an alternative paradigm
Combines neural networks directly in weight space
No access to original training data required
Introduces C$^2$M$^3$ algorithm for single-task merging
C$^2$M$^3$ uses Frank-Wolfe optimization
Covers both single-task and multi-task regimes
Multi-task setting assumes common pretrained initialization

Model Merging: Combining Neural Networks in Weight Space

Key facts

Entities

Institutions

Sources