CROP: AI Image Cropping via Compositional Reasoning
Researchers propose CROP, a novel method for aesthetic image cropping that reformulates the task as multimodal reasoning. Unlike saliency-based or retrieval-based approaches, CROP activates a vision-language model's analytical capabilities to think like a professional photographer. It addresses limitations of prior methods that struggle with compositional trade-offs in complex scenes or lack adaptive reasoning. The approach aims to align automated cropping with human expert preferences.
Key facts
- CROP stands for Compositional Reasoning and Optimizing Preference
- Method reformulates aesthetic cropping as multimodal reasoning task
- Activates VLM's analytical and comprehension capabilities in aesthetics
- Addresses limitations of saliency-based and retrieval-based methods
- Saliency-based methods struggle with compositional trade-offs in complex scenes
- Retrieval-based methods lack adaptive reasoning for unique scenes
- Aims to align automated cropping with human expert results
- Published on arXiv with ID 2605.12545
Entities
Institutions
- arXiv