Hierarchical Behaviour Spaces: A New RL Method Boosts Exploration in NetHack
Recent work in hierarchical reinforcement learning (RL) has scaled to billions of timesteps using predefined option reward functions. A new method, Hierarchical Behaviour Spaces (HBS), replaces single reward functions per option with linear combinations, allowing a more expressive policy set. HBS was evaluated on the NetHack Learning Environment, showing strong performance. Experiments suggest that hierarchy benefits come from increased exploration rather than long-term reasoning, challenging conventional wisdom.
Key facts
- HBS uses linear combinations of reward functions to induce a space of behaviours.
- The method was evaluated on the NetHack Learning Environment.
- Hierarchy benefits in HBS stem from increased exploration, not long-term reasoning.
- The work builds on hierarchical reinforcement learning with predefined option reward functions.
Entities
—