Training-Free Anti-Recomputation Boosts Video VLM Efficiency

ai-technology · 2026-05-07

A new method called training-free anti-recomputation reduces computational waste in video vision-language models (VLMs) by reusing previously computed visual state when validation confirms it remains stable. The approach, described in a paper on arXiv (2605.03351), targets the common inefficiency where VLMs reprocess unchanged visual content across multiple queries on the same video. On a frozen Qwen2.5-VL-7B-Instruct-4bit model, the technique achieved a 14.90–35.92x reduction in follow-up latency on a 93-query VideoMME breadth setting while preserving paired choices and correctness. The first query remains cold, but subsequent questions benefit from state reuse. Stress tests show repeated-question schedules hold through 50 turns, and dense-answer-anchored prompt variation allows trade-offs between conservative fixed K=1 repair and faster aggressive policies.

Key facts

Training-free anti-recomputation reuses visual state when validation confirms stability.
Method targets redundancy in VLM pipelines that reprocess unchanged frames.
Tested on frozen Qwen2.5-VL-7B-Instruct-4bit model.
Achieved 14.90–35.92x latency reduction on 93-query VideoMME breadth setting.
First query is cold; gains start with follow-up reuse.
Repeated-question schedules hold through 50 turns.
Dense-answer-anchored prompt variation enables policy trade-offs.
Paper published on arXiv with ID 2605.03351.

Training-Free Anti-Recomputation Boosts Video VLM Efficiency

Key facts

Entities

Institutions

Sources