SplitQ: Channel Splitting for Low-Bit VLM Quantization
Researchers propose SplitQ, a post-training quantization framework for large vision-language models (VLMs) that addresses heterogeneous activation distributions across text and vision modalities. The method introduces a Modality-specific Outlier Channel Decoupling (MOCD) module to isolate salient outlier channels, which are unevenly distributed across modalities. An Adaptive Cross-Modal Calibration (ACC) further reduces remaining distribution discrepancies. The work targets efficient deployment of VLMs on resource-constrained devices.
Key facts
- arXiv paper 2605.19929 proposes SplitQ for low-bit PTQ of VLMs
- Heterogeneous activation distributions between text and vision modalities cause accuracy degradation
- Outlier channels are modality-specific and unevenly distributed
- MOCD module isolates salient modality-specific outlier channels
- ACC module addresses cross-modal distribution discrepancies
- Goal is efficient VLM deployment on resource-constrained devices
Entities
Institutions
- arXiv