Sim-to-Real Gap in Sequential Decision Planning

publication · 2026-05-22

A new paper on arXiv (2605.21458) studies how planners should combine cheap but biased simulators with costly real experiments in sequential decision problems. The authors decompose the simulator's value error into a calibration-deployment shift (identifiable via randomization) and a parametric residual (not reducible by further interaction). They show the value gap between the simulator-optimal policy and the true optimum splits into a local component (on states the deployed policy visits) and a reachability component (on states it does not), which remains bounded away from zero under passive learning. The proposed method, Fisher-SEP, addresses this gap.

Key facts

arXiv paper 2605.21458
Studies sim-to-real gap in sequential decision planning
Decomposes simulator error into calibration-deployment shift and parametric residual
Value gap splits into local and reachability components
Reachability component stays bounded away from zero under passive learning
Proposes Fisher-SEP method
Simulator is cheap but biased; real experiments are unbiased but costly
Published on arXiv

Sim-to-Real Gap in Sequential Decision Planning

Key facts

Entities

Institutions

Sources