ARTFEED — Contemporary Art Intelligence

MLLM Feedback on Science Drawings Shows Modal Decoupling Failures

ai-technology · 2026-05-01

A recent study available on arXiv (2604.26957) indicates that multimodal large language models (MLLMs) such as GPT-5.1 provide feedback on students' hand-drawn scientific representations that, while seemingly educationally sound, frequently contradicts the actual visual content. Researchers examined 150 drawings from middle school students engaged in a kinetic molecular theory unit, covering five modeling tasks and three levels of competence, resulting in 300 instances of feedback. The analysis revealed grounding failures typical of modal decoupling, where the model's assertions lacked proper grounding in the specific visual elements, attributes, and relationships depicted in the drawings. This raises significant concerns regarding the dependability of MLLMs for automated evaluations in science education.

Key facts

  • Study published on arXiv with ID 2604.26957
  • Focuses on MLLM-generated feedback for hand-drawn scientific models
  • Analyzed 150 middle school drawings from a kinetic molecular theory unit
  • Drawings spanned five modeling tasks and three competence levels
  • Generated 300 feedback instances using GPT-5.1
  • Feedback exhibited grounding failures consistent with modal decoupling
  • Outputs were pedagogically plausible in form but contradicted drawings
  • Information encoded through visual objects, attributes, and relationships

Entities

Institutions

  • arXiv

Sources