MiniMax-M2: 229.9B-Parameter Mixture-of-Experts Model with Agentic RL

ai-technology · 2026-05-27

The MiniMax-M2 series presents a set of Mixture-of-Experts language models aimed at agentic applications. The primary model, M2, boasts a total of 229.9 billion parameters, with only 9.8 billion activated for each token. This series is built on three key elements: agent-driven data pipelines that generate extensive verifiable trajectories in agentic coding and cowork, anchored in an executable workspace with an artifact-aligned reward; Forge, a scalable RL system tailored for agents that accommodates long-horizon trajectories, utilizing windowed-FIFO scheduling, prefix-tree merging, inference optimization, and a clear separation of training, inference, and agents for both white-box and black-box types; and the latest M2.7 checkpoint, which marks an initial move towards self-evolution by autonomously debugging its training processes and altering its own code.

Key facts

MiniMax-M2 series is a family of Mixture-of-Experts language models.
Flagship M2 has 229.9B total parameters with 9.8B activated per token.
Designed end-to-end for agentic deployment.
Agent-driven data pipelines produce verifiable trajectories for coding and cowork.
Forge is a scalable agent-native RL system.
Forge includes windowed-FIFO scheduling, prefix-tree merging, inference optimization.
Training-inference-agent decoupling supports white-box and black-box agents.
M2.7 checkpoint autonomously debugs training runs and modifies its own code.

Entities

—

Sources

arXiv cs.AI — 2026-05-27