ARTFEED — Contemporary Art Intelligence

DistortBench: New Benchmark Tests VLMs on Image Distortion Perception

ai-technology · 2026-04-24

A new diagnostic benchmark named DistortBench has been developed by researchers to assess vision-language models (VLMs) regarding their capability to detect image distortions without a reference. This benchmark features 13,500 multiple-choice questions that cover 27 types of distortions, six perceptual categories, and five levels of severity. Among these, 25 distortions are calibrated with the KADID-10k dataset, and two additional rotation distortions utilize monotonic angle-based levels. The evaluation included 18 VLMs, comprising 17 open-weight models from five different families and one proprietary model. Although the top-performing model reached an accuracy of only 61.9%, it fell short of the human majority-vote baseline of 65.7% (with average individual human accuracy at 60.2%), highlighting the ongoing challenges in low-level perceptual understanding for VLMs.

Key facts

  • DistortBench contains 13,500 four-choice questions.
  • The benchmark covers 27 distortion types, six perceptual categories, and five severity levels.
  • 25 distortions are calibrated using KADID-10k.
  • Two rotation distortions use monotonic angle-based levels.
  • 18 VLMs were evaluated, including 17 open-weight models from five families and one proprietary model.
  • Best model accuracy: 61.9%.
  • Human majority-vote baseline: 65.7%.
  • Average individual human accuracy: 60.2%.

Entities

Institutions

  • arXiv

Sources