IBM Releases Granite 4.1 LLMs with Multi-Stage Training Pipeline
The Granite team at IBM has unveiled Granite 4.1, a series of dense decoder-only LLMs available in 3B, 8B, and 30B parameter configurations, developed from the ground up using around 15 trillion tokens. These models utilize a five-phase pre-training approach that transitions from extensive web-scale data to meticulously curated high-quality content, ultimately extending context length to 512K tokens. Following pre-training, they undergo supervised fine-tuning with 4.1 million high-quality samples, employing an LLM-as-Judge framework for quality assurance. A multi-stage reinforcement learning process, incorporating on-policy GRPO with DAPO loss, further enhances performance in math, coding, instruction adherence, and chat applications. Remarkably, the 8B instruct model either matches or exceeds the performance of the earlier Granite 4.0-H-Small (32B-A9B MoE), despite its simpler architecture. All models are available under the Apache 2.0 license. Training took place on an NVIDIA GB200 NVL72 cluster at CoreWeave, utilizing thousands of GPUs. These models cater to 12 languages and are designed for enterprise applications, ensuring predictable latency and reduced operational costs.
Key facts
- Granite 4.1 includes 3B, 8B, and 30B dense decoder-only LLMs.
- Trained on ~15 trillion tokens across five pre-training phases.
- Long-context extension up to 512K tokens via staged process.
- SFT uses 4.1M high-quality samples curated by LLM-as-Judge.
- Multi-stage RL pipeline uses GRPO with DAPO loss.
- 8B model matches or outperforms 32B-A9B MoE predecessor.
- Released under Apache 2.0 license.
- Trained on NVIDIA GB200 NVL72 cluster on CoreWeave.
Entities
Institutions
- IBM
- CoreWeave
- Hugging Face