SOLAR-RL: Semi-Online Reinforcement Learning for GUI Agents
Researchers propose SOLAR-RL (Semi-Online Long-horizon Assignment Reinforcement Learning) to address the dilemma between offline and online RL for training Multimodal Large Language Model (MLLM) agents on dynamic GUI tasks. Standard offline RL relies on static step-level data, ignoring global trajectory semantics like task completion and execution quality. Online RL captures long-term dynamics but incurs high interaction costs and environmental instability. SOLAR-RL integrates global trajectory insights into offline learning by reconstructing diverse rollout candidates from static data and detecting the first failure point using per-step validity checks. The approach aims to bridge the gap between offline and online RL, enabling more effective training of GUI agents without expensive online interactions. The paper is published on arXiv under identifier 2604.22558.
Key facts
- SOLAR-RL stands for Semi-Online Long-horizon Assignment Reinforcement Learning.
- It targets training MLLM agents on dynamic GUI tasks.
- Standard offline RL neglects global trajectory semantics.
- Online RL has high interaction costs and potential instability.
- SOLAR-RL reconstructs rollout candidates from static data.
- It detects first failure point using per-step validity checks.
- The approach integrates global trajectory insights into offline learning.
- Paper available on arXiv with ID 2604.22558.
Entities
Institutions
- arXiv