SERL: Selective Environment-Reweighted Learning Boosts LLM Agent Performance

ai-technology · 2026-05-20

Researchers have introduced a novel reinforcement learning framework known as SERL (Selective Environment-Reweighted Learning), which enhances credit assignment for multi-turn LLM agents by utilizing feedback from the environment at each step. SERL determines the direction of updates based on task rewards, while environmental feedback fine-tunes both the placement and intensity, emphasizing essential actions. In benchmarks like ALFWorld and WebShop, SERL records success rates of 90.0% and 80.1%, respectively, surpassing robust RL and distillation baselines. The approach examines five sources of feedback and two levels of insertion granularity, tackling the issue of distributing sparse success-or-failure signals across numerous actions in lengthy tasks. The research paper can be found on arXiv with the identifier 2605.19447.

Key facts

SERL stands for Selective Environment-Reweighted Learning
Achieves 90.0% success on ALFWorld
Achieves 80.1% success on WebShop
Uses task reward for update direction and environment feedback for placement and magnitude
Studies five feedback sources and two insertion granularities
Outperforms strong RL and distillation baselines
Addresses credit assignment in multi-turn LLM agents
Published on arXiv with ID 2605.19447

SERL: Selective Environment-Reweighted Learning Boosts LLM Agent Performance

Key facts

Entities

Institutions

Sources