GradingAttack Framework Exposes LLM Grading Vulnerabilities

other · 2026-05-25

A new framework called GradingAttack has been developed by researchers to identify security flaws in educational grading agents that utilize LLM technology. This system employs both token-level and prompt-level attack techniques to alter grading results discreetly. Tests conducted across various datasets indicate that both methods successfully undermine grading agents, with prompt-level attacks yielding superior success rates. This research underscores significant issues regarding the reliability of AI-driven grading systems used in practical settings.

Key facts

GradingAttack is a fine-grained adversarial attack framework for LLM grading agents.
It uses token-level and prompt-level attack strategies.
Prompt-level attacks achieve higher success rates.
Experiments conducted on multiple datasets.
The framework exposes fundamental weaknesses in current agent deployments.
LLMs are increasingly used for automatic short answer grading (ASAG).
The study focuses on security vulnerabilities of grading agents in the wild.
The paper is available on arXiv with ID 2602.00979.

GradingAttack Framework Exposes LLM Grading Vulnerabilities

Key facts

Entities

Institutions

Sources