DIQ-H Benchmark Tests VLM Robustness Under Adversarial Visual Conditions

ai-technology · 2026-04-30

The DIQ-H (Degraded Image Quality Leading to Hallucinations) benchmark has been launched to assess Vision-Language Models (VLMs) in challenging visual environments across continuous sequences. It replicates real-world challenges, including motion blur, sensor noise, and compression artifacts, to evaluate how these distortions result in ongoing inaccuracies and misaligned outputs over time. By explicitly modeling error propagation, this benchmark fills the gaps left by current benchmarks that concentrate on static or curated inputs, overlooking issues such as value misalignment and inconsistencies in cumulative reasoning. This initiative is essential for the development of embodied AI and applications in safety-critical fields like robotics and autonomous systems.

Key facts

DIQ-H is the first benchmark to evaluate VLMs under adversarial visual conditions in continuous sequences.
It simulates motion blur, sensor noise, and compression artifacts.
The benchmark measures persistent errors and misaligned outputs over time.
It explicitly models error propagation.
Existing benchmarks neglect adversarial conditions, value misalignment, and error propagation.
VLMs are essential for embodied AI and safety-critical applications.
The work is published on arXiv with ID 2512.03992.
The announcement type is replace-cross.

DIQ-H Benchmark Tests VLM Robustness Under Adversarial Visual Conditions

Key facts

Entities

Institutions

Sources