Compute Aligned Training Boosts LLM Test-Time Performance
A new training paradigm called Compute Aligned Training (CAT) addresses the misalignment between standard post-training methods and test-time inference strategies for Large Language Models (LLMs). Standard approaches like Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) optimize individual sample likelihood under a base policy, which does not account for test-time procedures that aggregate or filter outputs. CAT conceptualizes inference strategies as operators on the base policy and derives loss functions that maximize performance when those strategies are applied. The authors instantiate these loss functions for SFT and RL across common test-time strategies and provide empirical evidence that CAT substantially improves test-time scaling over standard training. The paper is available on arXiv.
Key facts
- Compute Aligned Training (CAT) aligns training objectives with test-time strategies.
- Standard post-training paradigms SFT and RL optimize individual sample likelihood, misaligned with test-time procedures.
- CAT conceptualizes inference strategies as operators on the base policy.
- New loss functions are derived for SFT and RL across common test-time strategies.
- Empirical evidence shows CAT substantially improves test-time scaling over standard training.
- The paper is available on arXiv.
Entities
Institutions
- arXiv