LLM Simulators Need Misconception Faithfulness, Not Just Output Similarity
A new framework from arXiv (2605.12748) evaluates whether large language models (LLMs) acting as simulated students maintain coherent misconceptions during interaction. The authors propose a misconception-contrastive feedback protocol that compares targeted feedback against misaligned and generic controls. They introduce the Selective Flip Score (SFS), which measures how often a simulator changes its answer under targeted feedback versus controls. The work aims to improve the reliability of LLM-based student simulators for training AI tutors and educators.
Key facts
- arXiv paper 2605.12748 introduces a framework for evaluating misconception faithfulness in LLM simulators.
- The framework uses a misconception-contrastive feedback protocol with targeted, misaligned, and generic feedback.
- Selective Flip Score (SFS) quantifies answer flips under targeted feedback.
- LLMs can generate student-like responses but may not behave like students with coherent misconceptions.
- The study focuses on evaluating simulators for training AI tutors and human educators.
Entities
Institutions
- arXiv