Twice Sequential Monte Carlo Tree Search Improves RL
Researchers introduce Twice Sequential Monte Carlo Tree Search (TSMCTS), a model-based reinforcement learning method that outperforms both the Sequential Monte Carlo (SMC) baseline and a modern version of Monte Carlo Tree Search (MCTS) as a policy improvement operator. TSMCTS addresses variance and path degeneracy issues in SMC, scaling better with increased search depth while remaining GPU-friendly. The method was tested across discrete and continuous environments, showing favorable scaling with sequential compute and reduced estimator variance.
Key facts
- TSMCTS outperforms SMC baseline and modern MCTS as policy improvement operator
- Addresses variance and path degeneracy in SMC
- Scales favorably with sequential compute
- Retains parallelization properties of SMC
- Tested across discrete and continuous environments
- Reduces estimator variance
- Mitigates effects of path degeneracy
- SMC is easier to parallelize and more suitable to GPU acceleration than MCTS
Entities
—