Explainability and Fairness in VLMs for Wellbeing Assessment

ai-technology · 2026-04-29

A new study from arXiv (2604.23786) investigates fairness and explainability in Vision-Language Models (VLMs) for wellbeing assessment and depression prediction. Researchers evaluated models across laboratory (AFAR-BSFT) and naturalistic (E-DAIC) datasets, finding significant performance disparities: Phi3.5-Vision achieved 80.4% accuracy on E-DAIC, while Qwen2-VL scored only 33.9%. Both models showed a tendency to over-predict depression on AFAR-BSFT, raising concerns about diagnostic reliability and demographic fairness. The work highlights the under-explored intersection of Explainable AI (XAI) and multimodal foundation models in clinical mental health monitoring.

Key facts

Study investigates fairness and explainability in VLMs for wellbeing assessment
Evaluated on laboratory (AFAR-BSFT) and naturalistic (E-DAIC) datasets
Phi3.5-Vision achieved 80.4% accuracy on E-DAIC
Qwen2-VL achieved 33.9% accuracy on E-DAIC
Both models over-predicted depression on AFAR-BSFT
Application of XAI to VLMs for depression prediction is under-explored
Research published on arXiv (2604.23786)
Concerns about transparency and bias in clinical deployment of VLMs

Explainability and Fairness in VLMs for Wellbeing Assessment

Key facts

Entities

Institutions

Sources