Explainability and Fairness in VLMs for Wellbeing Assessment
A new study from arXiv (2604.23786) investigates fairness and explainability in Vision-Language Models (VLMs) for wellbeing assessment and depression prediction. Researchers evaluated models across laboratory (AFAR-BSFT) and naturalistic (E-DAIC) datasets, finding significant performance disparities: Phi3.5-Vision achieved 80.4% accuracy on E-DAIC, while Qwen2-VL scored only 33.9%. Both models showed a tendency to over-predict depression on AFAR-BSFT, raising concerns about diagnostic reliability and demographic fairness. The work highlights the under-explored intersection of Explainable AI (XAI) and multimodal foundation models in clinical mental health monitoring.
Key facts
- Study investigates fairness and explainability in VLMs for wellbeing assessment
- Evaluated on laboratory (AFAR-BSFT) and naturalistic (E-DAIC) datasets
- Phi3.5-Vision achieved 80.4% accuracy on E-DAIC
- Qwen2-VL achieved 33.9% accuracy on E-DAIC
- Both models over-predicted depression on AFAR-BSFT
- Application of XAI to VLMs for depression prediction is under-explored
- Research published on arXiv (2604.23786)
- Concerns about transparency and bias in clinical deployment of VLMs
Entities
Institutions
- arXiv