ARTFEED — Contemporary Art Intelligence

ReuseRL: Skill Compression Improves Agentic RL Generalization

ai-technology · 2026-06-01

A new framework called ReuseRL has been developed by researchers, which anchors agentic reinforcement learning in the Minimum Description Length (MDL) principle. This framework creates a common skill dictionary derived from effective trajectories and imposes penalties on unique behaviors through a segmentation cost. A PAC-Bayes generalization bound has been established. Testing on ALFWorld, TextWorld-Cooking, and Countdown-Stepwise indicates that ReuseRL enhances both in- and out-of-distribution success compared to standard GRPO and round-length benchmarks.

Key facts

  • ReuseRL grounds agentic RL in the Minimum Description Length (MDL) principle.
  • It extracts a shared skill dictionary from successful trajectories.
  • The RL objective is augmented with a segmentation cost penalizing idiosyncratic behaviors.
  • A PAC-Bayes generalization bound is proven for the compression penalty.
  • Evaluated on ALFWorld, TextWorld-Cooking, and Countdown-Stepwise.
  • Improves in- and out-of-distribution success over vanilla GRPO and round-length baselines.
  • Large language model agents trained with RL often learn brittle, task-specific shortcuts.
  • The hypothesis is that agents generalize better when successful trajectories are structurally compressible.

Entities

Institutions

  • arXiv

Sources