MuCRASP: Structured Pruning for VLM Chain-of-Thought Reasoning

ai-technology · 2026-05-26

Researchers have discovered that current structured pruning techniques do not maintain the accuracy of chain-of-thought (CoT) reasoning in vision-language models (VLMs). They point to two primary reasons for this shortfall: the reliance of CoT consistency on sparse pivot tokens in generation paths, which are overlooked by pruning methods, and the failure of pruning designed for unimodal LLMs to consider differences in activation distribution between visual and textual modalities. To tackle this issue, they introduce MuCRASP, a structured pruning framework that focuses on components critical for reasoning while ensuring cross-modal alignment and addressing layer-wise sensitivity within a global parameter budget. Tests conducted on four VLMs across three reasoning benchmarks demonstrate consistent enhancements.

Key facts

MuCRASP is a structured pruning framework for VLMs.
It targets reasoning-critical components in chain-of-thought generation.
Existing pruning methods are CoT-agnostic and ignore pivot tokens.
Unimodal pruning fails due to cross-modal activation differences.
MuCRASP preserves cross-modal alignment under a global parameter budget.
Tested on four VLMs across three reasoning benchmarks.
The work is published on arXiv under ID 2605.25842.
The method addresses deployment cost of large VLMs.

MuCRASP: Structured Pruning for VLM Chain-of-Thought Reasoning

Key facts

Entities

Institutions

Sources