Open-Source Pipeline for Illumination Control in Diffusion Models
A novel pipeline that is entirely open-source and reproducible for mastering illumination control in diffusion models has been unveiled. This method employs a data engine to convert well-lit images into supervised training triplets, which consist of a poorly lit input image, a natural language lighting directive, and a well-lit output image. The diffusion model is then fine-tuned using this dataset, demonstrating notable enhancements compared to baseline models SD 1.5, SDXL, and FLUX.1-dev in terms of perceptual similarity, structural similarity, and identity preservation. This solution is constructed solely with open-source resources and publicly accessible data, filling the void left by proprietary models that necessitate extensive control inputs like depth maps or do not provide code and data.
Key facts
- Pipeline is fully open-source and reproducible.
- Data engine creates training triplets from well-lit images.
- Triplets consist of poorly-illuminated input, lighting instruction, and well-illuminated output.
- Finetuned diffusion model outperforms SD 1.5, SDXL, and FLUX.1-dev.
- Improvements in perceptual similarity, structural similarity, and identity preservation.
- Built with open-source tools and publicly available data.
- Addresses limitations of closed-source models requiring heavy control inputs.
- Paper available on arXiv with identifier 2604.24877.
Entities
—