FullFlow: Upgrading Text-to-Image Models for Bidirectional Generation
FullFlow represents an efficient approach that enhances a pretrained rectified-flow model for text-to-image generation, transforming it into a bidirectional vision-language generator. This method exclusively trains LoRA adapters along with lightweight text heads, ensuring that images remain in a continuous flow while incorporating a discrete process for text insertion. By utilizing distinct timesteps for images and text, it facilitates various functionalities, including text-to-image, image-to-text, joint sampling, and partial-text prediction, all supported by a single backbone.
Key facts
- FullFlow upgrades text-to-image models to bidirectional vision-language generation.
- It uses LoRA adapters and lightweight text heads.
- Images remain in continuous flow; text is added via discrete insertion.
- Separate timesteps for image and text enable multiple generation modes.
- The method is parameter-efficient, avoiding large-scale retraining.
- It works with rectified-flow text-to-image models.
- The approach preserves the strong image prior of the original model.
- FullFlow enables text-to-image, image-to-text, joint sampling, and partial-text prediction.
Entities
—