ZAYA1-8B: A 700M Active Parameter Reasoning MoE Model
ZAYA1-8B, a mixture-of-experts (MoE) model, comprises 700 million active parameters out of a total of 8 billion. It was created by Zyphra utilizing their MoE++ framework. The model underwent training from the ground up on a comprehensive AMD compute platform, incorporating reasoning data from the beginning through an answer-preserving trimming method. It performs on par with or surpasses DeepSeek-R1-0528 in mathematics and coding evaluations while remaining a strong contender against larger open-weight reasoning models. The post-training process consists of a four-phase reinforcement learning cascade: initial reasoning warmup on math and puzzles, a 400-task RLVE-Gym curriculum, math and code RL with test-time compute traces, and behavioral RL for chat and instruction adherence.
Key facts
- ZAYA1-8B has 700M active and 8B total parameters.
- Built on Zyphra's MoE++ architecture.
- Pretraining, midtraining, and SFT performed on AMD compute platform.
- Matches or exceeds DeepSeek-R1-0528 on math and coding benchmarks.
- Trained from scratch for reasoning with answer-preserving trimming.
- Post-training uses a four-stage RL cascade.
- Includes 400-task RLVE-Gym curriculum.
- Uses synthetic code environments from competitive programming references.
Entities
Institutions
- Zyphra
- AMD