CROP: AI Image Cropping via Compositional Reasoning

ai-technology · 2026-05-14

Researchers propose CROP, a novel method for aesthetic image cropping that reformulates the task as multimodal reasoning. Unlike saliency-based or retrieval-based approaches, CROP activates a vision-language model's analytical capabilities to think like a professional photographer. It addresses limitations of prior methods that struggle with compositional trade-offs in complex scenes or lack adaptive reasoning. The approach aims to align automated cropping with human expert preferences.

Key facts

CROP stands for Compositional Reasoning and Optimizing Preference
Method reformulates aesthetic cropping as multimodal reasoning task
Activates VLM's analytical and comprehension capabilities in aesthetics
Addresses limitations of saliency-based and retrieval-based methods
Saliency-based methods struggle with compositional trade-offs in complex scenes
Retrieval-based methods lack adaptive reasoning for unique scenes
Aims to align automated cropping with human expert results
Published on arXiv with ID 2605.12545

CROP: AI Image Cropping via Compositional Reasoning

Key facts

Entities

Institutions

Sources