Sim-to-Real Gap in Sequential Decision Planning
A new paper on arXiv (2605.21458) studies how planners should combine cheap but biased simulators with costly real experiments in sequential decision problems. The authors decompose the simulator's value error into a calibration-deployment shift (identifiable via randomization) and a parametric residual (not reducible by further interaction). They show the value gap between the simulator-optimal policy and the true optimum splits into a local component (on states the deployed policy visits) and a reachability component (on states it does not), which remains bounded away from zero under passive learning. The proposed method, Fisher-SEP, addresses this gap.
Key facts
- arXiv paper 2605.21458
- Studies sim-to-real gap in sequential decision planning
- Decomposes simulator error into calibration-deployment shift and parametric residual
- Value gap splits into local and reachability components
- Reachability component stays bounded away from zero under passive learning
- Proposes Fisher-SEP method
- Simulator is cheap but biased; real experiments are unbiased but costly
- Published on arXiv
Entities
Institutions
- arXiv