Hierarchical Behaviour Spaces: A New RL Method Boosts Exploration in NetHack

ai-technology · 2026-04-29

Recent work in hierarchical reinforcement learning (RL) has scaled to billions of timesteps using predefined option reward functions. A new method, Hierarchical Behaviour Spaces (HBS), replaces single reward functions per option with linear combinations, allowing a more expressive policy set. HBS was evaluated on the NetHack Learning Environment, showing strong performance. Experiments suggest that hierarchy benefits come from increased exploration rather than long-term reasoning, challenging conventional wisdom.

Key facts

HBS uses linear combinations of reward functions to induce a space of behaviours.
The method was evaluated on the NetHack Learning Environment.
Hierarchy benefits in HBS stem from increased exploration, not long-term reasoning.
The work builds on hierarchical reinforcement learning with predefined option reward functions.

Entities

—

Sources

arXiv cs.AI — 2026-04-28