LLM-Driven Grading System for K-12 Non-Native English Learners
A new framework adapts large language model outputs to the proficiency levels of K-12 English learners in non-native contexts, using China's national curriculum (CSE) as a case study. The system controls lexical complexity via a four-tier grading system, supported by graded vocabulary lists and a multi-turn dialogue corpus. The core technical contribution is the DDPO algorithm (Diversity Driven Policy Optimization), a multi-turn GRPO-based approach that preserves dialogue diversity while optimizing quality. DDPO achieves low out-of-vocabulary rates and high diversity, enhancing conversational naturalness and pedagogical effectiveness. The research addresses the widespread challenge of proficiency mismatch in LLMs for education.
Key facts
- Framework adapts LLM outputs to learner abilities
- Uses China's national curriculum (CSE) as representative case
- Four-tier grading system for lexical complexity
- New resources: graded vocabulary lists and multi-turn dialogue corpus
- Core technical contribution: DDPO algorithm
- DDPO stands for Diversity Driven Policy Optimization
- DDPO is a multi-turn GRPO-based approach
- Achieves low out-of-vocabulary rates and high diversity
Entities
—