ARTFEED — Contemporary Art Intelligence

GradingAttack Framework Exposes LLM Grading Vulnerabilities

other · 2026-05-25

A new framework called GradingAttack has been developed by researchers to identify security flaws in educational grading agents that utilize LLM technology. This system employs both token-level and prompt-level attack techniques to alter grading results discreetly. Tests conducted across various datasets indicate that both methods successfully undermine grading agents, with prompt-level attacks yielding superior success rates. This research underscores significant issues regarding the reliability of AI-driven grading systems used in practical settings.

Key facts

  • GradingAttack is a fine-grained adversarial attack framework for LLM grading agents.
  • It uses token-level and prompt-level attack strategies.
  • Prompt-level attacks achieve higher success rates.
  • Experiments conducted on multiple datasets.
  • The framework exposes fundamental weaknesses in current agent deployments.
  • LLMs are increasingly used for automatic short answer grading (ASAG).
  • The study focuses on security vulnerabilities of grading agents in the wild.
  • The paper is available on arXiv with ID 2602.00979.

Entities

Institutions

  • arXiv

Sources