ARTFEED — Contemporary Art Intelligence

LLMs Struggle with Multimodal Physics Problems

ai-technology · 2026-05-07

A study on arXiv evaluates three large language models (Claude, Gemini, ChatGPT) on multimodal physics problems from the OpenStax database. While all models achieved 96% accuracy on text-only problems, performance dropped substantially on multimodal tasks. The research develops an empirical error taxonomy and tests a structured dialogue intervention to address multimodal processing limitations.

Key facts

  • Study evaluates LLMs on multimodal physics problems
  • Models tested: Claude, Gemini, ChatGPT
  • Problems from OpenStax database
  • 96% accuracy on text-only problems
  • Performance declined on multimodal problems
  • Empirical error taxonomy developed
  • Structured multimodal dialogue intervention tested
  • ArXiv paper ID: 2605.04131

Entities

Institutions

  • OpenStax
  • arXiv

Sources