Training-Free Flow Matching Boosts Diversity Without Losing Quality
A novel mechanism for control during inference time, which does not require training, significantly improves diversity in flow-based text-to-image models while maintaining image quality. This method promotes a lateral spread among trajectories through a feature-space objective and reinstates uncertainty with time-scheduled stochastic perturbation. Notably, this perturbation is orthogonally projected to the generation flow, a geometric constraint that enhances variation without compromising image details or prompt accuracy. Theoretically, this design consistently increases a volume measure, ensuring enhanced diversity. It overcomes the challenge posed by deterministic trajectories in flow-based models, which can make exploring diverse modes expensive with limited sampling budgets. Existing approaches typically necessitate retraining or result in quality loss. This research, detailed in arXiv:2510.09060v2, presents an effective solution for boosting diversity in text-to-image generation.
Key facts
- Training-free inference-time control mechanism for flow-based text-to-image models
- Enhances diversity without degrading image fidelity
- Encourages lateral spread among trajectories via feature-space objective
- Reintroduces uncertainty through time-scheduled stochastic perturbation
- Perturbation is projected orthogonal to generation flow
- Geometric constraint preserves image details and prompt fidelity
- Theoretically monotonically increases a volume measure
- Addresses limitation of deterministic trajectories in flow-based models
Entities
Institutions
- arXiv