Transformers Can Implement In-Context Reinforcement Learning, Study Shows

ai-technology · 2026-05-09

A new study featured on arXiv (2605.05755) has found that transformers can successfully engage in in-context reinforcement learning (ICRL), enabling them to develop and implement learning strategies from trajectory data without needing to adjust their parameters. The researchers showed that a linear self-attention transformer block can apply policy-improvement methods like semi-gradient SARSA and actor-critic, thanks to certain parameter configurations. They introduced a training approach that simulates a teaching process, analyzed gradient flow dynamics, and offered the first assurance of convergence in ICRL: with suitable richness in the training MDP distribution, gradient flow will converge locally and exponentially towards an optimal parameter manifold in line with the desired RL update. Training experiments on randomly generated tabular MDPs confirmed these results, with the models accurately mirroring the designed parameter structure.

Key facts

Paper on arXiv (2605.05755) shows transformers can implement in-context reinforcement learning
Linear self-attention block can implement policy-improvement methods like semi-gradient SARSA and actor-critic
First convergence guarantee in ICRL literature established
Teacher-mimicking training procedure designed
Gradient-flow dynamics analyzed
Convergence to optimal parameter manifold under suitable conditions
Empirical validation on randomly generated tabular MDPs
Learned models recover parameter structure of explicit constructions

Transformers Can Implement In-Context Reinforcement Learning, Study Shows

Key facts

Entities

Institutions

Sources