GRACE: Gradient-Aligned Method for Efficient Reasoning Data Curation

other · 2026-05-14

A team of researchers has unveiled GRACE, an innovative technique for choosing high-quality reasoning data after training. Unlike current methods that treat all parts of a sample equally, GRACE looks at reasoning as a series of optimization events. It evaluates each step based on how well it aligns with the answer-focused direction and its coherence with previous steps. By scoring these individual steps, GRACE generates a value for the entire sample, relying on the model’s internal signals rather than outside rewards or annotations. Additionally, it uses a representation-level gradient proxy to determine alignment through a single forward pass. This method has been applied in post-training for Qwen3-VL-2B-Instruct using the MMathCoT-1M dataset.

Key facts

GRACE scores each step in a reasoning trace by gradient alignment and consistency.
No external reward models or step annotations are required.
A representation-level gradient proxy estimates step alignment in one forward pass.
Method applied to post-train Qwen3-VL-2B-Instruct on MMathCoT-1M.
Existing pipelines treat all steps as equally valuable.
Step-level scores are aggregated into a sample-level value.
GRACE uses only the model's internal optimization signals.
The approach is designed for efficient subset selection.

Entities

—

Sources

arXiv cs.AI — 2026-05-14