ActFocus: Token Reweighting Resolves Action Bottleneck in RL for LLMs
A new paper on arXiv (2605.14558) reveals that in agentic reinforcement learning for large language models, uniform credit assignment across tokens misallocates training signals. The authors demonstrate from an energy-based modeling perspective that token-level training signals, measured by correlation with reward variance across rollouts, concentrate on action tokens rather than reasoning tokens, despite actions being a small fraction of the trajectory. They call this the Action Bottleneck. To address it, they propose ActFocus, a simple token reweighting approach that downweights gradient contributions from non-action tokens. The method is designed to improve policy-gradient methods like PPO and GRPO by focusing learning on the tokens that matter most for reward. The paper is a cross submission and was announced on arXiv.
Key facts
- Paper ID: arXiv:2605.14558
- Announce type: cross
- Focuses on agentic reinforcement learning for LLMs
- Identifies Action Bottleneck: training signals concentrate on action tokens
- Proposes ActFocus: token reweighting method
- ActFocus downweights non-action token gradients
- Aims to improve PPO and GRPO
- Uses energy-based modeling perspective
Entities
Institutions
- arXiv