Distributional RL Extended to Partially Observable Markov Decision Processes

other · 2026-05-07

A recent study published on arXiv advances Distributional Reinforcement Learning (DistRL) to the realm of Partially Observable Markov Decision Processes (POMDPs). The researchers present distributional Bellman operators tailored for partial observability and demonstrate their convergence using the supremum p-Wasserstein metric. They introduce a finite representation of return distributions through psi-vectors, which extend the traditional alpha-vectors found in POMDP solvers. Furthermore, they create Distributional Point-Based Value Iteration (DPBVI), incorporating psi-vectors into a conventional point-based backup method. This research is driven by advancements in world model techniques, where latent models simulate beliefs and facilitate planning. The paper can be found under arXiv:2505.06518v3.

Key facts

The paper extends Distributional Reinforcement Learning to POMDPs.
New distributional Bellman operators are introduced for partial observability.
Convergence is proven under the supremum p-Wasserstein metric.
A finite representation via psi-vectors generalizes alpha-vectors.
DPBVI integrates psi-vectors into point-based backup.
The work is motivated by world model approaches.
The paper is on arXiv with ID 2505.06518v3.
The announcement type is 'replace'.

Distributional RL Extended to Partially Observable Markov Decision Processes

Key facts

Entities

Institutions

Sources