ARTFEED — Contemporary Art Intelligence

CROP: AI Image Cropping via Compositional Reasoning

ai-technology · 2026-05-14

Researchers propose CROP, a novel method for aesthetic image cropping that reformulates the task as multimodal reasoning. Unlike saliency-based or retrieval-based approaches, CROP activates a vision-language model's analytical capabilities to think like a professional photographer. It addresses limitations of prior methods that struggle with compositional trade-offs in complex scenes or lack adaptive reasoning. The approach aims to align automated cropping with human expert preferences.

Key facts

  • CROP stands for Compositional Reasoning and Optimizing Preference
  • Method reformulates aesthetic cropping as multimodal reasoning task
  • Activates VLM's analytical and comprehension capabilities in aesthetics
  • Addresses limitations of saliency-based and retrieval-based methods
  • Saliency-based methods struggle with compositional trade-offs in complex scenes
  • Retrieval-based methods lack adaptive reasoning for unique scenes
  • Aims to align automated cropping with human expert results
  • Published on arXiv with ID 2605.12545

Entities

Institutions

  • arXiv

Sources