Reinforcement Learning with Markov Risk Measures and Multipattern Approximation

other · 2026-05-04

A novel category of Markov coherent risk measures, termed mini-batch measures, has been proposed for risk-averse finite-horizon Markov Decision Problems. Additionally, the research introduces multipattern risk-averse issues that extend linear systems. These theories are utilized in a feature-based Q-learning approach featuring multipattern Q-factor approximation, which achieves a high-probability regret bound of O(H^2 N^H sqrt(K)), where H represents the horizon, N denotes the mini-batch size, and K indicates the number of episodes. Furthermore, an efficient variant of the Q-learning technique is introduced, optimizing the policy evaluation phase. The theoretical findings are illustrated through a stochastic assignment scenario and a short-horizon multi-armed bandit challenge.

Key facts

Introduces mini-batch Markov coherent risk measures.
Defines multipattern risk-averse problems generalizing linear systems.
Proposes feature-based Q-learning with multipattern Q-factor approximation.
Proves regret bound O(H^2 N^H sqrt(K)).
Proposes economical Q-learning version streamlining policy evaluation.
Illustrated on stochastic assignment problem.
Illustrated on short-horizon multi-armed bandit problem.
H is horizon, N is mini-batch size, K is number of episodes.

Reinforcement Learning with Markov Risk Measures and Multipattern Approximation

Key facts

Entities

Institutions

Sources