ReuseRL: Skill Compression Improves Agentic RL Generalization

ai-technology · 2026-06-01

A new framework called ReuseRL has been developed by researchers, which anchors agentic reinforcement learning in the Minimum Description Length (MDL) principle. This framework creates a common skill dictionary derived from effective trajectories and imposes penalties on unique behaviors through a segmentation cost. A PAC-Bayes generalization bound has been established. Testing on ALFWorld, TextWorld-Cooking, and Countdown-Stepwise indicates that ReuseRL enhances both in- and out-of-distribution success compared to standard GRPO and round-length benchmarks.

Key facts

ReuseRL grounds agentic RL in the Minimum Description Length (MDL) principle.
It extracts a shared skill dictionary from successful trajectories.
The RL objective is augmented with a segmentation cost penalizing idiosyncratic behaviors.
A PAC-Bayes generalization bound is proven for the compression penalty.
Evaluated on ALFWorld, TextWorld-Cooking, and Countdown-Stepwise.
Improves in- and out-of-distribution success over vanilla GRPO and round-length baselines.
Large language model agents trained with RL often learn brittle, task-specific shortcuts.
The hypothesis is that agents generalize better when successful trajectories are structurally compressible.

ReuseRL: Skill Compression Improves Agentic RL Generalization

Key facts

Entities

Institutions

Sources