ARTFEED — Contemporary Art Intelligence

LightKV: Reducing Vision Token KV Cache in LVLMs

ai-technology · 2026-05-04

Researchers propose LightKV, a method to reduce the Key-Value (KV) cache size in Large Vision-Language Models (LVLMs) by exploiting redundancy among vision-token embeddings. Guided by text prompts, LightKV uses cross-modality message passing to aggregate and compress vision tokens during the prefill stage, distinguishing it from prior vision-only compression strategies. Evaluated on eight open-source LVLMs across eight benchmarks including MME and SeedBench, LightKV achieves performance with only 55% of the original vision tokens, significantly reducing GPU memory overhead.

Key facts

  • LightKV reduces KV cache size in LVLMs.
  • It uses cross-modality message passing guided by text prompts.
  • Evaluated on eight open-source LVLMs and eight benchmarks.
  • Achieves performance with 55% of original vision tokens.
  • Addresses GPU memory overhead from vision tokens.
  • Distinguished from prior vision-only compression methods.
  • Tested on MME and SeedBench datasets.
  • Published on arXiv with ID 2605.00789.

Entities

Institutions

  • arXiv

Sources