Behavioral Task Sampling Improves Zero-Shot Offline RL

other · 2026-04-30

A recent study published on arXiv (2604.25496) introduces a technique aimed at enhancing zero-shot offline reinforcement learning by directly deriving task vectors from offline datasets instead of using random sampling. Traditionally, task-conditioned policies are trained with randomly chosen task vectors that establish linear reward functions across learned state representations. The researchers contend that this method results in less effective generalization. Their straightforward procedure for extracting reward functions can be incorporated into current algorithms, yielding an average improvement of 8% in zero-shot performance across various benchmark environments and baselines.

Key facts

arXiv paper 2604.25496
Improves zero-shot offline RL
Proposes behavioral task sampling
Extracts task vectors from offline dataset
Integrates into existing algorithms
Improves zero-shot performance by average 8%
Tested across multiple benchmark environments
Addresses limitation of random task vector sampling

Behavioral Task Sampling Improves Zero-Shot Offline RL

Key facts

Entities

Institutions

Sources