Offline Policy Evaluation via Discounted Liveness Formulation

other · 2026-05-13

A new framework for offline policy evaluation in robotic manipulation addresses challenges of sparse rewards and finite-horizon truncation bias. The method uses a liveness-based Bellman operator to interpret evaluation as a task-completion problem, yielding a conservative fixed-point value function robust to truncation. Theoretical analysis includes contraction guarantees. The work is published on arXiv (2605.11479).

Key facts

Policy evaluation is fundamental for robotic policy development.
Sparse rewards and non-monotonic task progression challenge evaluation.
Finite-length rollouts introduce truncation bias.
Proposed framework uses a liveness-based Bellman operator.
Formulation yields a conservative fixed-point value function.
Theoretical properties include contraction guarantees.
Published on arXiv with ID 2605.11479.
Announce type is cross.

Offline Policy Evaluation via Discounted Liveness Formulation

Key facts

Entities

Institutions

Sources