ActFocus: Token Reweighting Resolves Action Bottleneck in RL for LLMs

other · 2026-05-16

A new paper on arXiv (2605.14558) reveals that in agentic reinforcement learning for large language models, uniform credit assignment across tokens misallocates training signals. The authors demonstrate from an energy-based modeling perspective that token-level training signals, measured by correlation with reward variance across rollouts, concentrate on action tokens rather than reasoning tokens, despite actions being a small fraction of the trajectory. They call this the Action Bottleneck. To address it, they propose ActFocus, a simple token reweighting approach that downweights gradient contributions from non-action tokens. The method is designed to improve policy-gradient methods like PPO and GRPO by focusing learning on the tokens that matter most for reward. The paper is a cross submission and was announced on arXiv.

Key facts

Paper ID: arXiv:2605.14558
Announce type: cross
Focuses on agentic reinforcement learning for LLMs
Identifies Action Bottleneck: training signals concentrate on action tokens
Proposes ActFocus: token reweighting method
ActFocus downweights non-action token gradients
Aims to improve PPO and GRPO
Uses energy-based modeling perspective

ActFocus: Token Reweighting Resolves Action Bottleneck in RL for LLMs

Key facts

Entities

Institutions

Sources