ARTFEED — Contemporary Art Intelligence

GEAR: Adaptive Credit Assignment for LLM Agents via Self-Distillation

ai-technology · 2026-05-13

Researchers propose GEAR (Granularity-adaptivE Advantage Reweighting), a credit assignment framework for reinforcement learning in LLM agents. The method addresses the limitation of coarse outcome-level rewards by using token- and segment-level signals from self-distillation. GEAR reshapes trajectory-level GRPO advantage by comparing an on-policy student with a ground-truth-conditioned teacher to identify adaptive segment boundaries and modulate local advantage weights. The divergence signal spikes at semantic deviations, improving credit assignment in long-horizon trajectories. The paper is available on arXiv (2605.11853).

Key facts

  • GEAR is a credit assignment framework for LLM agents.
  • It uses token- and segment-level signals from self-distillation.
  • It reshapes trajectory-level GRPO advantage.
  • It compares on-policy student with ground-truth-conditioned teacher.
  • Divergence signal identifies adaptive segment boundaries.
  • Divergence spikes at onset of semantic deviation.
  • Paper available on arXiv: 2605.11853.
  • Announce type: cross.

Entities

Institutions

  • arXiv

Sources