StratFormer: Transformer-Based Opponent Modeling in Imperfect-Information Games

ai-technology · 2026-04-30

A new AI system called StratFormer uses a transformer architecture to model and exploit opponents in imperfect-information games. Developed through a two-phase curriculum, it first trains an opponent modeling head to identify behavioral patterns while playing a game-theoretic optimal (GTO) policy, then shifts toward best-response (BR) exploitation using a per-opponent regularization schedule. The architecture features dual-turn tokens and bucket-rate features encoding opponent tendencies across five strategic contexts. Tested on Leduc Hold'em, a small poker variant, against six opponent archetypes at two strength levels, exploitability ranged from 0.15 to 1.26 Big Blinds (BB) per hand.

Key facts

StratFormer is a transformer-based meta-agent for opponent modeling and exploitation.
It uses a two-phase curriculum: first GTO policy, then BR exploitation.
Dual-turn tokens and bucket-rate features encode opponent tendencies.
Tested on Leduc Hold'em with six opponent archetypes at two strength levels.
Exploitability ranged from 0.15 to 1.26 BB per hand.

Entities

—

Sources

arXiv cs.AI — 2026-04-29