MobileExplorer: On-Device GUI Agent Acceleration via Online Exploration
MobileExplorer is an innovative framework designed to enhance on-device inference for vision-oriented mobile GUI agents via online exploration. It takes advantage of the extended reasoning time of vision-language models (VLMs) by conducting lightweight, simultaneous exploration of user interface components. While the model is inferring, the agent actively investigates semantically pertinent UI elements, storing these exploration paths as structured memory. A dual-level rollback system guarantees dependable performance in real-time mobile settings. This strategy mitigates privacy issues and delays linked to cloud-based models, thereby facilitating the complete on-device implementation of mobile GUI agents.
Key facts
- MobileExplorer is a framework for on-device inference acceleration.
- It targets vision-based mobile GUI agents.
- The key idea is online exploration during VLM reasoning.
- Lightweight, parallel UI element exploration is performed.
- Exploration traces are stored as structured memory.
- A two-level rollback mechanism ensures reliability.
- It addresses privacy and latency issues of cloud models.
- Fully on-device deployment is currently underexplored.
Entities
—