LEO: Efficient All-Goals Learning for Goal-Conditioned RL
Researchers propose Learning Everything all at Once (LEO), a method for goal-conditioned reinforcement learning that jointly outputs values and actions for every goal in a single network pass. This enables efficient, parallel all-goals updates, overcoming the computational infeasibility of naive relabelling. LEO significantly outperforms other methods on goal-conditioned Craftax and matches baselines on continuous control tasks, achieving over 250x speed-up compared to all-goals relabelling. The approach maximizes information extraction from each transition by learning off-policy with respect to every goal.
Key facts
- LEO jointly outputs values and actions for every goal at once.
- It enables efficient, parallel all-goals updates with a single network pass.
- The method significantly outperforms others on goal-conditioned Craftax.
- It is competitive with existing baselines on continuous control environments.
- LEO achieves a >250x speed-up compared to all-goals relabelling.
- All-goals learning uses each transition for off-policy learning with respect to every goal.
- Naive relabelling is computationally infeasible.
- Goal-conditioned reinforcement learning agents typically discard most information from trajectories.
Entities
—