Shallow RL Agents Master Schnapsen Card Game
A recent preprint on arXiv (2605.17162) explores the capability of shallow neural network agents in mastering the card game Schnapsen and competing against a formidable search-based benchmark, RdeepBot, which employs Monte Carlo sampling alongside lookahead search techniques. The study, guided by a progressively intricate experimental framework, first assesses a supervised learning agent (MLPBot) trained on replay data, followed by a reinforcement learning agent (RLBot) utilizing the same shallow architecture but trained via asynchronous Monte Carlo updates and experience replay. Findings indicate that supervised imitation lacks the generalization needed to overcome RdeepBot, while reinforcement learning yields significantly stronger agents. Notably, optimal performance occurs when the learned value function is integrated with deeper lookahead, resulting in RLBot achieving statistically significant higher win rates against the strong baseline.
Key facts
- arXiv:2605.17162
- Shallow neural network agents investigated for Schnapsen
- Baseline: RdeepBot (Monte Carlo sampling + lookahead search)
- Supervised agent: MLPBot trained on replay data
- Reinforcement learning agent: RLBot with same shallow architecture
- RLBot trained via asynchronous Monte Carlo updates and experience replay
- Supervised imitation fails to beat strong RdeepBot opponents
- RLBot achieves statistically significant higher winning rates when combined with deeper lookahead
Entities
Institutions
- arXiv