AcquisitionSynthesis: AI Data Generation via Acquisition Functions
Researchers propose AcquisitionSynthesis, a method using acquisition functions from active learning as reward models to train language models for generating higher-quality synthetic data. The approach addresses a common limitation in existing data generation techniques—lack of quantitative measurement of generated samples' impact on downstream learners. Acquisition functions provide interpretable, model-centric signals of informativeness and influence. The work is published on arXiv (2605.13149).
Key facts
- AcquisitionSynthesis uses acquisition functions as reward models.
- It trains language models to generate higher-quality synthetic data.
- Existing methods rely on rejection sampling or larger models.
- Acquisition functions measure informativeness and influence.
- The approach provides interpretable, model-centric signals.
- The paper is on arXiv with ID 2605.13149.
- Data quality is a critical bottleneck for competitive models.
- The method is inspired by active learning literature.
Entities
Institutions
- arXiv