AutoScale: Closed-Loop Data Mixture for Real-Synthetic Co-Training in Autonomous Driving
A new paper on arXiv (2605.21372) proposes AutoScale, a closed-loop data engine that dynamically optimizes the mixture of real and synthetic driving data for end-to-end autonomous driving models. The authors argue that naive incorporation of all available synthetic data is inefficient and causes distribution shifts. AutoScale iteratively adjusts the training data mixture based on evaluation feedback to maximize model performance under practical training budgets. The work addresses the under-explored problem of optimizing data mixture for real-synthetic co-training, aiming to leverage near-infinite synthetic data while mitigating scene bias and annotation costs of real-world data.
Key facts
- Paper ID: arXiv:2605.21372
- Title: Closed Loop Dynamic Driving Data Mixture for Real-Synthetic Co-Training
- Proposes AutoScale, a fully automated closed-loop data engine
- Addresses data scaling for end-to-end autonomous driving
- Real-world data is expensive to annotate and scene-biased
- Naive synthetic data inclusion leads to distribution shifts
- Optimizes data mixture iteratively via evaluation feedback
- Focuses on practical training budgets
Entities
Institutions
- arXiv