Weblica: Scalable Web Environments for Visual Agent Training
A new framework named Weblica has been introduced by researchers to create reproducible and scalable web environments for training visual web agents. This framework employs HTTP-level caching to record and replay stable visual states while maintaining interactive functionalities, alongside LLM-based environment synthesis based on actual websites. Weblica facilitates reinforcement learning training across a multitude of varied environments. The Weblica-8B model that emerges from this framework surpasses open-weight baselines on several web navigation benchmarks, achieving this with a reduced number of inference steps.
Key facts
- Weblica uses HTTP-level caching for stable visual state replay.
- LLM-based environment synthesis is grounded in real-world websites.
- RL training scales to thousands of diverse environments.
- Weblica-8B outperforms open-weight baselines of similar size.
- Weblica-8B uses fewer inference steps than baselines.
- The web is complex, open-ended, and constantly changing.
- Existing data collection is limited to offline trajectories or simulated environments.
- Weblica stands for Web Replica.
Entities
Institutions
- arXiv