SERL: Selective Environment-Reweighted Learning Boosts LLM Agent Performance
Researchers have introduced a novel reinforcement learning framework known as SERL (Selective Environment-Reweighted Learning), which enhances credit assignment for multi-turn LLM agents by utilizing feedback from the environment at each step. SERL determines the direction of updates based on task rewards, while environmental feedback fine-tunes both the placement and intensity, emphasizing essential actions. In benchmarks like ALFWorld and WebShop, SERL records success rates of 90.0% and 80.1%, respectively, surpassing robust RL and distillation baselines. The approach examines five sources of feedback and two levels of insertion granularity, tackling the issue of distributing sparse success-or-failure signals across numerous actions in lengthy tasks. The research paper can be found on arXiv with the identifier 2605.19447.
Key facts
- SERL stands for Selective Environment-Reweighted Learning
- Achieves 90.0% success on ALFWorld
- Achieves 80.1% success on WebShop
- Uses task reward for update direction and environment feedback for placement and magnitude
- Studies five feedback sources and two insertion granularities
- Outperforms strong RL and distillation baselines
- Addresses credit assignment in multi-turn LLM agents
- Published on arXiv with ID 2605.19447
Entities
Institutions
- arXiv