SU-01: A 30B-A3B Model Achieves Gold-Medal-Level Olympiad Reasoning via Simple Scaling

ai-technology · 2026-05-14

The newly developed AI model, SU-01, has reached gold-medal-level success in challenges presented by the International Mathematical Olympiad (IMO) and the International Physics Olympiad (IPhO). Utilizing a straightforward and cohesive approach, this model is founded on a 30B-A3B architecture that underwent supervised fine-tuning (SFT) with around 340K sub-8K-token trajectories, followed by 200 reinforcement learning (RL) iterations. The methodology incorporates a reverse-perplexity curriculum for SFT to promote thorough proof-search and self-verification, alongside a two-phase RL process that transitions from verifiable rewards to proof-level RL, and test-time scaling to enhance solving capabilities. This research is elaborated in the arXiv paper 2605.13301, titled 'Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling,' showcasing a notable leap in tackling complex mathematical and scientific problems.

Key facts

SU-01 achieves gold-medal-level performance on IMO and IPhO problems.
Model uses a 30B-A3B backbone.
Trained on around 340K sub-8K-token trajectories.
Training involved 200 RL steps.
Recipe includes reverse-perplexity curriculum for SFT.
Two-stage RL pipeline: verifiable rewards then proof-level RL.
Test-time scaling is used to boost performance.
Paper published on arXiv with ID 2605.13301.

SU-01: A 30B-A3B Model Achieves Gold-Medal-Level Olympiad Reasoning via Simple Scaling

Key facts

Entities

Institutions

Sources