CoVSpec: Efficient Device-Edge Co-Inference for Vision-Language Models

ai-technology · 2026-05-06

A new framework called CoVSpec has been introduced by researchers to enhance device-edge co-inference of vision-language models (VLMs) through speculative decoding. This method tackles the challenges of applying speculative decoding to VLMs, specifically the high visual-token computation and significant communication overhead. CoVSpec features a novel, training-free technique for reducing visual tokens on mobile devices by evaluating query relevance, token activity, and low-rank dependency. This allows a streamlined draft VLM on a mobile device to work in tandem with a more powerful target VLM located on an edge server, thereby lowering both computational and memory requirements. Details of this research can be found in arXiv paper 2605.02218.

Key facts

CoVSpec is a framework for device-edge co-inference of VLMs.
It uses speculative decoding with a lightweight draft VLM on mobile and a larger target VLM on edge server.
A training-free visual token reduction method prunes redundant tokens.
Token reduction considers query relevance, token activity, and low-rank dependency.
The approach addresses excessive visual-token computation and communication overhead.
The paper is available on arXiv with ID 2605.02218.
The work aims to deploy large VLMs on mobile devices.
The method does not require additional training.

CoVSpec: Efficient Device-Edge Co-Inference for Vision-Language Models

Key facts

Entities

Institutions

Sources