Study Tests LLM Tutors' Resistance to Adversarial Student Attacks in Educational Settings
A study investigates the resilience of Large Language Models (LLMs) functioning as educational tutors against students' attempts to extract complete answers. This research, available on arXiv with the identifier 2604.18660v1, focuses on situations where students intentionally seek correct answers rather than utilizing educational support. The researchers assessed various LLM-based tutoring models, including different families of models, those aligned with pedagogical goals, and multi-agent systems. They adapted six categories of adversarial and persuasive tactics specifically for educational settings to test tutor susceptibility. The study addresses a gap in existing literature that generally assumes students have good intentions, instead examining misuse. The primary measure of pedagogical effectiveness is answer leakage, which refers to the provision of full solutions rather than guided learning. This research reveals the conflict between the inherent helpfulness of LLMs and fundamental educational values when confronted with manipulative student behavior.
Key facts
- Large Language Models (LLMs) are increasingly deployed in educational contexts
- Prior evaluations of pedagogical quality often measure answer leakage
- Previous research typically assumed cooperative, well-intentioned learners
- The study examines scenarios where students behave adversarially
- Researchers tested multiple LLM-based tutor models and architectures
- Six groups of adversarial techniques were adapted for educational settings
- The paper evaluates tutor robustness against various student attacks
- Answer leakage measures how readily tutors disclose complete solutions
Entities
Institutions
- arXiv