NFDRL: Parameter-Efficient Distributional RL with Normalizing Flows
A new approach to distributional reinforcement learning (DistRL) called NFDRL uses continuous normalizing flows to model return distributions, offering a compact parameter footprint that does not scale with resolution. Unlike categorical methods like C51, which require parameters to scale linearly with resolution, or quantile methods that use piecewise-constant densities, NFDRL provides dynamic adaptive support. Training employs a Cramér-inspired geometry-aware distance defined over probability masses. The method is detailed in arXiv:2505.04310.
Key facts
- NFDRL models return distributions using continuous normalizing flows
- Parameter count does not grow with effective resolution
- Cramér-inspired geometry-aware distance used for training
- Outperforms categorical and quantile baselines in parameter efficiency
- Dynamic adaptive support for returns
- arXiv:2505.04310
- Announce type: replace
- Distributional RL improves over expectation-based methods
Entities
Institutions
- arXiv