Offline Policy Evaluation via Discounted Liveness Formulation
A new framework for offline policy evaluation in robotic manipulation addresses challenges of sparse rewards and finite-horizon truncation bias. The method uses a liveness-based Bellman operator to interpret evaluation as a task-completion problem, yielding a conservative fixed-point value function robust to truncation. Theoretical analysis includes contraction guarantees. The work is published on arXiv (2605.11479).
Key facts
- Policy evaluation is fundamental for robotic policy development.
- Sparse rewards and non-monotonic task progression challenge evaluation.
- Finite-length rollouts introduce truncation bias.
- Proposed framework uses a liveness-based Bellman operator.
- Formulation yields a conservative fixed-point value function.
- Theoretical properties include contraction guarantees.
- Published on arXiv with ID 2605.11479.
- Announce type is cross.
Entities
Institutions
- arXiv