ARTFEED — Contemporary Art Intelligence

Reinforcement Learning with Markov Risk Measures and Multipattern Approximation

other · 2026-05-04

A novel category of Markov coherent risk measures, termed mini-batch measures, has been proposed for risk-averse finite-horizon Markov Decision Problems. Additionally, the research introduces multipattern risk-averse issues that extend linear systems. These theories are utilized in a feature-based Q-learning approach featuring multipattern Q-factor approximation, which achieves a high-probability regret bound of O(H^2 N^H sqrt(K)), where H represents the horizon, N denotes the mini-batch size, and K indicates the number of episodes. Furthermore, an efficient variant of the Q-learning technique is introduced, optimizing the policy evaluation phase. The theoretical findings are illustrated through a stochastic assignment scenario and a short-horizon multi-armed bandit challenge.

Key facts

  • Introduces mini-batch Markov coherent risk measures.
  • Defines multipattern risk-averse problems generalizing linear systems.
  • Proposes feature-based Q-learning with multipattern Q-factor approximation.
  • Proves regret bound O(H^2 N^H sqrt(K)).
  • Proposes economical Q-learning version streamlining policy evaluation.
  • Illustrated on stochastic assignment problem.
  • Illustrated on short-horizon multi-armed bandit problem.
  • H is horizon, N is mini-batch size, K is number of episodes.

Entities

Institutions

  • arXiv

Sources