Online Reinforcement Learning Agent for Web Navigation

other · 2026-05-01

A new research paper introduces OpAgent, an autonomous web agent that uses online reinforcement learning to navigate real-world websites. Unlike conventional methods relying on static datasets and supervised fine-tuning, OpAgent learns through direct iterative interactions with unconstrained web environments. The approach features hierarchical multi-task fine-tuning with datasets categorized into Planning, Acting, and Grounding primitives, establishing a Vision-Language Model with strong instruction-following capabilities. This addresses distributional shifts caused by offline trajectories failing to capture stochastic state transitions and real-time feedback.

Key facts

Paper title: OpAgent: Operator Agent for Web Navigation
arXiv ID: 2602.13559v2
Proposes online reinforcement learning for web agents
Uses hierarchical multi-task fine-tuning
Datasets categorized by functional primitives: Planning, Acting, Grounding
Based on Vision-Language Model
Addresses distributional shift from offline methods
Learns through direct iterative interactions with websites

Online Reinforcement Learning Agent for Web Navigation

Key facts

Entities

Institutions

Sources