CAMPA: Efficient Decoupled Multimodal Graph Learning Framework
Researchers propose CAMPA, a decoupled multimodal graph learning framework addressing modal conflict in propagation and aggregation stages. The framework improves efficiency and scalability for large-scale multimodal attributed graphs by aligning cross-modal semantic information. The paper presents systematic empirical analysis showing decoupled MGNNs outperform tightly coupled architectures in computational efficiency, while identifying modal conflict as a key bottleneck. CAMPA introduces cross-modal aligned propagation and aggregation to mitigate semantic divergence and misaligned multi-hop feature trajectories. The work is published on arXiv with identifier 2605.11468.
Key facts
- CAMPA stands for Cross-modal Aligned Multimodal Propagation & Aggregation.
- The paper is published on arXiv with identifier 2605.11468.
- The research focuses on multimodal graph neural networks (MGNNs).
- Decoupled MGNNs are more efficient and scalable than tightly coupled architectures.
- Modal conflict arises in both propagation and aggregation stages.
- Independent multi-hop diffusion causes cross-modal semantic divergence during propagation.
- Naive fusion fails to align multi-hop feature trajectories during aggregation.
- CAMPA addresses modal conflict through cross-modal alignment.
Entities
Institutions
- arXiv