AgentLens: Adaptive Visual Modalities for Mobile GUI Agents

ai-technology · 2026-04-24

AgentLens functions as a mobile graphical user interface agent that dynamically utilizes three visual modes—Full UI, Partial UI, and GenUI—during interactions between humans and agents. It enhances conventional mobile agents by incorporating adaptive communication strategies and utilizes Virtual Display for executing tasks in the background with targeted visual overlays. This system was created following iterative formative research, which indicated that users favor a hybrid approach featuring just-in-time visual engagement, with the optimal visualization mode varying according to the specific task. A controlled study involving 21 participants assessed the system's effectiveness.

Key facts

AgentLens uses three visual modalities: Full UI, Partial UI, and GenUI.
It extends standard mobile agents with adaptive communication actions.
Virtual Display enables background execution with selective visual overlays.
Formative studies showed users prefer a hybrid model with just-in-time visual interaction.
The most effective visualization modality depends on the task.
A controlled study with 21 participants was conducted.
The paper is on arXiv with ID 2604.20279.
The announcement type is cross.

AgentLens: Adaptive Visual Modalities for Mobile GUI Agents

Key facts

Entities

Institutions

Sources