GRACE: Gradient-Aligned Method for Efficient Reasoning Data Curation
A team of researchers has unveiled GRACE, an innovative technique for choosing high-quality reasoning data after training. Unlike current methods that treat all parts of a sample equally, GRACE looks at reasoning as a series of optimization events. It evaluates each step based on how well it aligns with the answer-focused direction and its coherence with previous steps. By scoring these individual steps, GRACE generates a value for the entire sample, relying on the model’s internal signals rather than outside rewards or annotations. Additionally, it uses a representation-level gradient proxy to determine alignment through a single forward pass. This method has been applied in post-training for Qwen3-VL-2B-Instruct using the MMathCoT-1M dataset.
Key facts
- GRACE scores each step in a reasoning trace by gradient alignment and consistency.
- No external reward models or step annotations are required.
- A representation-level gradient proxy estimates step alignment in one forward pass.
- Method applied to post-train Qwen3-VL-2B-Instruct on MMathCoT-1M.
- Existing pipelines treat all steps as equally valuable.
- Step-level scores are aggregated into a sample-level value.
- GRACE uses only the model's internal optimization signals.
- The approach is designed for efficient subset selection.
Entities
—