LLM-Driven Grading System for K-12 Non-Native English Learners

other · 2026-04-27

A new framework adapts large language model outputs to the proficiency levels of K-12 English learners in non-native contexts, using China's national curriculum (CSE) as a case study. The system controls lexical complexity via a four-tier grading system, supported by graded vocabulary lists and a multi-turn dialogue corpus. The core technical contribution is the DDPO algorithm (Diversity Driven Policy Optimization), a multi-turn GRPO-based approach that preserves dialogue diversity while optimizing quality. DDPO achieves low out-of-vocabulary rates and high diversity, enhancing conversational naturalness and pedagogical effectiveness. The research addresses the widespread challenge of proficiency mismatch in LLMs for education.

Key facts

Framework adapts LLM outputs to learner abilities
Uses China's national curriculum (CSE) as representative case
Four-tier grading system for lexical complexity
New resources: graded vocabulary lists and multi-turn dialogue corpus
Core technical contribution: DDPO algorithm
DDPO stands for Diversity Driven Policy Optimization
DDPO is a multi-turn GRPO-based approach
Achieves low out-of-vocabulary rates and high diversity

Entities

—

Sources

arXiv cs.AI — 2026-04-27