Twice Sequential Monte Carlo Tree Search Improves RL

other · 2026-05-23

Researchers introduce Twice Sequential Monte Carlo Tree Search (TSMCTS), a model-based reinforcement learning method that outperforms both the Sequential Monte Carlo (SMC) baseline and a modern version of Monte Carlo Tree Search (MCTS) as a policy improvement operator. TSMCTS addresses variance and path degeneracy issues in SMC, scaling better with increased search depth while remaining GPU-friendly. The method was tested across discrete and continuous environments, showing favorable scaling with sequential compute and reduced estimator variance.

Key facts

TSMCTS outperforms SMC baseline and modern MCTS as policy improvement operator
Addresses variance and path degeneracy in SMC
Scales favorably with sequential compute
Retains parallelization properties of SMC
Tested across discrete and continuous environments
Reduces estimator variance
Mitigates effects of path degeneracy
SMC is easier to parallelize and more suitable to GPU acceleration than MCTS

Entities

—

Sources

arXiv cs.AI — 2026-05-23