Directional Convergence: Language as the Attractor in Multimodal Neural Networks
A recent study published on arXiv (2605.09352) presents a novel approach to directional convergence analysis through cycle-kNN, an asymmetric measure of alignment. This research explores the reasons behind the convergence of independently trained neural networks from various modalities toward common representations. The study, which encompasses numerous unimodal models, including those related to point clouds, vision, and language, reveals a notable directional asymmetry: non-language modalities tend to gravitate towards the neighborhood structure of language much more than vice versa. This trend persists across all model types and sizes, yet remains undetected by symmetric measures. The analysis attributes this directionality to an asymmetry in feature density, with language representations occupying the most compact regions. These results challenge current symmetric methodologies and indicate that language serves as a significant attractor in multimodal convergence.
Key facts
- Study introduces directional convergence analysis using cycle-kNN.
- Cycle-kNN is an asymmetric alignment measure.
- Applied across dozens of independently trained unimodal models.
- Modalities include point clouds, vision, and language.
- Non-language modalities converge toward language neighborhood structure.
- Pattern holds across all model families and scales.
- Directionality is invisible to symmetric similarity measures.
- Mechanistic analysis traces directionality to feature density asymmetry.
Entities
Institutions
- arXiv