LightKV: Reducing Vision Token KV Cache in LVLMs

ai-technology · 2026-05-04

Researchers propose LightKV, a method to reduce the Key-Value (KV) cache size in Large Vision-Language Models (LVLMs) by exploiting redundancy among vision-token embeddings. Guided by text prompts, LightKV uses cross-modality message passing to aggregate and compress vision tokens during the prefill stage, distinguishing it from prior vision-only compression strategies. Evaluated on eight open-source LVLMs across eight benchmarks including MME and SeedBench, LightKV achieves performance with only 55% of the original vision tokens, significantly reducing GPU memory overhead.

Key facts

LightKV reduces KV cache size in LVLMs.
It uses cross-modality message passing guided by text prompts.
Evaluated on eight open-source LVLMs and eight benchmarks.
Achieves performance with 55% of original vision tokens.
Addresses GPU memory overhead from vision tokens.
Distinguished from prior vision-only compression methods.
Tested on MME and SeedBench datasets.
Published on arXiv with ID 2605.00789.

LightKV: Reducing Vision Token KV Cache in LVLMs

Key facts

Entities

Institutions

Sources