DistortBench: New Benchmark Tests VLMs on Image Distortion Perception
A new diagnostic benchmark named DistortBench has been developed by researchers to assess vision-language models (VLMs) regarding their capability to detect image distortions without a reference. This benchmark features 13,500 multiple-choice questions that cover 27 types of distortions, six perceptual categories, and five levels of severity. Among these, 25 distortions are calibrated with the KADID-10k dataset, and two additional rotation distortions utilize monotonic angle-based levels. The evaluation included 18 VLMs, comprising 17 open-weight models from five different families and one proprietary model. Although the top-performing model reached an accuracy of only 61.9%, it fell short of the human majority-vote baseline of 65.7% (with average individual human accuracy at 60.2%), highlighting the ongoing challenges in low-level perceptual understanding for VLMs.
Key facts
- DistortBench contains 13,500 four-choice questions.
- The benchmark covers 27 distortion types, six perceptual categories, and five severity levels.
- 25 distortions are calibrated using KADID-10k.
- Two rotation distortions use monotonic angle-based levels.
- 18 VLMs were evaluated, including 17 open-weight models from five families and one proprietary model.
- Best model accuracy: 61.9%.
- Human majority-vote baseline: 65.7%.
- Average individual human accuracy: 60.2%.
Entities
Institutions
- arXiv