Shallow RL Agents Master Schnapsen Card Game

other · 2026-05-20

A recent preprint on arXiv (2605.17162) explores the capability of shallow neural network agents in mastering the card game Schnapsen and competing against a formidable search-based benchmark, RdeepBot, which employs Monte Carlo sampling alongside lookahead search techniques. The study, guided by a progressively intricate experimental framework, first assesses a supervised learning agent (MLPBot) trained on replay data, followed by a reinforcement learning agent (RLBot) utilizing the same shallow architecture but trained via asynchronous Monte Carlo updates and experience replay. Findings indicate that supervised imitation lacks the generalization needed to overcome RdeepBot, while reinforcement learning yields significantly stronger agents. Notably, optimal performance occurs when the learned value function is integrated with deeper lookahead, resulting in RLBot achieving statistically significant higher win rates against the strong baseline.

Key facts

arXiv:2605.17162
Shallow neural network agents investigated for Schnapsen
Baseline: RdeepBot (Monte Carlo sampling + lookahead search)
Supervised agent: MLPBot trained on replay data
Reinforcement learning agent: RLBot with same shallow architecture
RLBot trained via asynchronous Monte Carlo updates and experience replay
Supervised imitation fails to beat strong RdeepBot opponents
RLBot achieves statistically significant higher winning rates when combined with deeper lookahead

Shallow RL Agents Master Schnapsen Card Game

Key facts

Entities

Institutions

Sources