PODS: A New Framework for Dynamic Data Volume Scheduling in Model Training
A new paper on arXiv (2605.14773) introduces PODS (Plug-and-play Oscillatory Data-volume Scheduling), a framework that dynamically adjusts the volume of selected data during model training. Existing data selection methods focus on which samples to choose but keep the selection ratio fixed, leading to a static data volume. The authors show that varying the selection ratio introduces an implicit regularization effect, with lower ratios amplifying regularization and higher ratios preserving data coverage. PODS is a lightweight module that oscillates the data volume over time, improving training efficiency without requiring new sample-scoring metrics. The work reframes data selection as an optimization problem, highlighting a trade-off between regularization and fidelity.
Key facts
- Paper published on arXiv with ID 2605.14773
- PODS stands for Plug-and-play Oscillatory Data-volume Scheduling
- Existing methods fix the selected data volume as a target ratio throughout training
- Selected-data training induces an implicit regularization effect modulated by the instantaneous selection ratio
- Lower ratios amplify selection-induced regularization
- Higher ratios preserve data coverage and optimization fidelity
- PODS serves as a lightweight module that does not introduce new sample-scoring metrics
- The work revisits data selection from an optimization perspective
Entities
Institutions
- arXiv