ARTFEED — Contemporary Art Intelligence

VL-LCM: A New Metric for Evaluating Vision-Language Logical Consistency in MLLMs Without Ground-Truth Annotations

ai-technology · 2026-05-09

Researchers propose the Vision-Language Logical Consistency Metric (VL-LCM) to evaluate multimodal large language models (MLLMs) on logical consistency without requiring ground-truth annotations. The metric is based on basic logic principles, assessing both sufficient and necessary cause-effect relations in vision-language tasks. VL-LCM is applied to traditional MC-VQA tests and recent NaturalBench tests. Systematic experiments on MMMU and NaturalBench benchmarks evaluated 11 open-source MLLMs from 4 frontier families. Findings reveal that while recent MLLMs show significant progress in accuracy, their logical consistency lags behind. The study also examines correlations between VL-LCM and ground-truth metrics, reliability of LCM, and related aspects.

Key facts

  • VL-LCM evaluates vision-language logical consistency without ground-truth annotations.
  • Metric is based on sufficient and necessary cause-effect relations.
  • Applied to MC-VQA and NaturalBench tests.
  • Tested on 11 open-source MLLMs from 4 frontier families.
  • Evaluated on MMMU and NaturalBench benchmarks.
  • Recent MLLMs show accuracy progress but logical consistency lags.
  • Study includes correlations with ground-truth metrics and reliability analysis.
  • Published on arXiv with ID 2605.06201.

Entities

Institutions

  • arXiv

Sources