MLLM Feedback on Science Drawings Shows Modal Decoupling Failures

ai-technology · 2026-05-01

A recent study available on arXiv (2604.26957) indicates that multimodal large language models (MLLMs) such as GPT-5.1 provide feedback on students' hand-drawn scientific representations that, while seemingly educationally sound, frequently contradicts the actual visual content. Researchers examined 150 drawings from middle school students engaged in a kinetic molecular theory unit, covering five modeling tasks and three levels of competence, resulting in 300 instances of feedback. The analysis revealed grounding failures typical of modal decoupling, where the model's assertions lacked proper grounding in the specific visual elements, attributes, and relationships depicted in the drawings. This raises significant concerns regarding the dependability of MLLMs for automated evaluations in science education.

Key facts

Study published on arXiv with ID 2604.26957
Focuses on MLLM-generated feedback for hand-drawn scientific models
Analyzed 150 middle school drawings from a kinetic molecular theory unit
Drawings spanned five modeling tasks and three competence levels
Generated 300 feedback instances using GPT-5.1
Feedback exhibited grounding failures consistent with modal decoupling
Outputs were pedagogically plausible in form but contradicted drawings
Information encoded through visual objects, attributes, and relationships

MLLM Feedback on Science Drawings Shows Modal Decoupling Failures

Key facts

Entities

Institutions

Sources