MiniMax-M2: 229.9B-Parameter Mixture-of-Experts Model with Agentic RL
The MiniMax-M2 series presents a set of Mixture-of-Experts language models aimed at agentic applications. The primary model, M2, boasts a total of 229.9 billion parameters, with only 9.8 billion activated for each token. This series is built on three key elements: agent-driven data pipelines that generate extensive verifiable trajectories in agentic coding and cowork, anchored in an executable workspace with an artifact-aligned reward; Forge, a scalable RL system tailored for agents that accommodates long-horizon trajectories, utilizing windowed-FIFO scheduling, prefix-tree merging, inference optimization, and a clear separation of training, inference, and agents for both white-box and black-box types; and the latest M2.7 checkpoint, which marks an initial move towards self-evolution by autonomously debugging its training processes and altering its own code.
Key facts
- MiniMax-M2 series is a family of Mixture-of-Experts language models.
- Flagship M2 has 229.9B total parameters with 9.8B activated per token.
- Designed end-to-end for agentic deployment.
- Agent-driven data pipelines produce verifiable trajectories for coding and cowork.
- Forge is a scalable agent-native RL system.
- Forge includes windowed-FIFO scheduling, prefix-tree merging, inference optimization.
- Training-inference-agent decoupling supports white-box and black-box agents.
- M2.7 checkpoint autonomously debugs training runs and modifies its own code.
Entities
—