ARTFEED — Contemporary Art Intelligence

LLM Tutoring Agents Fail at Distinguishing Suboptimal from Incorrect Solutions

ai-technology · 2026-05-18

A recent study published on arXiv (2605.16207) assesses seven feedback agents based on large language models (LLMs) in the realm of propositional logic tutoring, utilizing ground truth derived from knowledge graphs across 10,836 pairs of solutions and feedback. While the models demonstrated near-perfect accuracy for optimal steps, they consistently rejected valid yet suboptimal reasoning and mistakenly validated incorrect solutions—areas where adaptive tutoring is crucial. These shortcomings appeared to stem from architectural limitations rather than issues with the information itself. Furthermore, accurate diagnostics did not consistently yield feedback that was actionable from a pedagogical perspective.

Key facts

  • Study evaluates seven LLM feedback agents in propositional logic tutoring
  • Uses knowledge-graph-derived ground truth across 10,836 solution-feedback pairs
  • Models near-ceiling on optimal steps but over-reject valid suboptimal reasoning
  • Models over-validate incorrect solutions
  • Failures persist across models regardless of solution context
  • Suggests architectural rather than informational limits
  • Accurate diagnosis does not reliably produce pedagogically actionable feedback
  • Published on arXiv with ID 2605.16207

Entities

Institutions

  • arXiv

Sources