AQuaUI: Training-Free Token Reduction for GUI Agents via Adaptive Quadtrees

ai-technology · 2026-05-20

AQuaUI presents a novel method for reducing tokens during inference for GUI agent models, eliminating the need for training. By utilizing the varying information density found in screenshots, it builds an adaptive quadtree for each input, retaining a single merged token for each leaf. This strategy maintains spatial relationships while minimizing visual tokens without the need for extra training or attention-based compression. It tackles the issue of high-resolution GUI screenshots, where extensive areas may contain minimal information, while critical text and icons demand high accuracy. AQuaUI is suggested as an effective solution for LMM-based GUI agents that incorporate screenshots at every iteration.

Key facts

AQuaUI is a training-free inference-time token reduction method
It uses adaptive quadtrees on screenshot inputs
One representative merged token is kept per leaf of the quadtree
It preserves spatial positions
It addresses non-uniform information density in GUI screenshots
No additional training or attention-based compression is required
Targets LMM-based GUI agent models

AQuaUI: Training-Free Token Reduction for GUI Agents via Adaptive Quadtrees

Key facts

Entities

Institutions

Sources