ARTFEED — Contemporary Art Intelligence

STRATAGEM AI Framework Enhances Reasoning Transfer in Language Models Through Game-Based Learning

ai-technology · 2026-04-22

A recent research paper, "Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play," presents an innovative method for enhancing general reasoning skills in language models. It tackles two key obstacles to reasoning transfer: domain specificity, where learned behaviors are tied to game semantics, and contextual stasis, where unchanging game scenarios hinder the development of advanced reasoning. The STRATAGEM framework promotes trajectories that display abstract, domain-independent reasoning through a Reasoning Transferability Coefficient, while also encouraging adaptive reasoning growth with a Reasoning Evolution Reward. Games serve as an effective environment for fostering these skills, requiring strategic planning, probabilistic inference, and flexible decision-making. Traditional self-play methods focus only on final game results, lacking a way to differentiate transferable reasoning from game-specific strategies. Experiments in mathematical reasoning, general reasoning, and code generation validate the framework's success. This paper is available on arXiv under the identifier arXiv:2604.17696v1.

Key facts

  • The research paper "Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play" was published on arXiv
  • The paper introduces the STRATAGEM framework for developing general reasoning capabilities in language models
  • STRATAGEM addresses domain specificity and contextual stasis as barriers to reasoning transfer
  • The framework uses a Reasoning Transferability Coefficient to reinforce domain-agnostic reasoning
  • A Reasoning Evolution Reward incentivizes adaptive reasoning development
  • Games provide a paradigm for developing reasoning capabilities requiring strategic planning and decision-making
  • Existing self-play approaches rely solely on terminal game outcomes without distinguishing transferable patterns
  • Experiments were conducted across mathematical reasoning, general reasoning, and code generation benchmarks

Entities

Institutions

  • arXiv

Sources