One-Step-Train: Efficient Data Selection for Multimodal Models

ai-technology · 2026-05-11

Researchers have unveiled a new approach called One-Step-Train (OST), which reshapes how we select data by framing it as a challenge in incremental optimization for large multimodal models (LMMs). Unlike previous methods, such as LLM-as-a-Judge, which are expensive and hard to understand, OST assesses the added value of each sample using a simulated single-step update with a straightforward proxy. Tests on the Qwen series across different multimodal math reasoning benchmarks show that OST achieves Pareto-optimal efficiency. By focusing on the top-50 samples, it reduces training costs by 43% and saves 17% in time, while exceeding the LLM-as-a-Judge benchmark by 1.8 points, effectively addressing the balance between quality and quantity in synthetic data.

Key facts

OST reformulates data selection as incremental optimization utility ranking
OST uses simulated single-step update on lightweight proxy
Experiments on Qwen series across multimodal mathematical reasoning benchmarks
Top-50 subset reduces training costs by 43%
Total time consumption reduced by 17%
Outperforms LLM-as-a-Judge baseline by 1.8 points
Addresses quality-quantity trade-off in synthetic data
Proposed framework is interpretable and computationally efficient

One-Step-Train: Efficient Data Selection for Multimodal Models

Key facts

Entities

Institutions

Sources