Target-Aligned Generation Boosts Cross-Domain Offline RL
A new framework called Target-aligned Coverage Expansion (TCE) has been introduced by researchers to tackle the challenges of distributional mismatch in cross-domain offline reinforcement learning when the source and target environments are not aligned. TCE employs a dual score-based generative model to create transitions that are consistent with the target, allowing for the choice between integrating target-near transitions or enhancing coverage via generation. Experimental results indicate that TCE surpasses leading baselines in various cross-domain settings.
Key facts
- Cross-domain offline RL adapts a policy from source to target domain using pre-collected datasets.
- Environment dynamics may differ between source and target domains.
- Key challenge is reducing distributional mismatch with limited target data.
- TCE framework uses dual score-based generative model for target-aligned generation.
- TCE decides between direct incorporation of target-near transitions or coverage expansion.
- TCE consistently outperforms state-of-the-art cross-domain offline RL baselines.
- Extensive experiments across diverse cross-domain environments were conducted.
- The approach is guided by theoretical analysis.
Entities
Institutions
- arXiv