PRISM: Training-Free Data Selection for Multimodal LLMs
A new method called PRISM (Self-Pruning Intrinsic Selection Method) addresses redundancy in visual instruction tuning datasets for Multimodal Large Language Models (MLLMs). The approach identifies anisotropy in visual feature distributions, which causes a Global Semantic Drift that existing selection methods overlook. PRISM operates without training or proxy models, reducing computational costs. The method was introduced in arXiv:2502.12119v4.
Key facts
- PRISM is a training-free method for selecting instruction data for MLLMs.
- It targets redundancy in visual instruction tuning datasets.
- The method identifies anisotropy in visual feature distributions.
- Anisotropy induces a Global Semantic Drift.
- Existing methods rely on computationally demanding proxy-based inference or training-based metrics.
- PRISM aims to reduce computational costs.
- The paper is available on arXiv with ID 2502.12119v4.
- The approach is designed for scalable and effective tuning of MLLMs.
Entities
Institutions
- arXiv