Security Enhancement Framework for Medical AI Agents

other · 2026-05-12

A new study proposes ARSM-Agent, a full-link security enhancement framework for adversarial robust large language model intelligent agents in medical decision-making. The framework includes stages: input risk perception, medical evidence constraint, knowledge consistency verification, decision confidence reweighting, security output control, and adversarial feedback update. The weighted joint objective combines decision accuracy loss (0.3), adversarial robustness loss (0.3), safety refusal loss (0.2), and knowledge consistency loss (0.2). The algorithm outperforms four baselines (LLM-Agent, Retrieval-Agent, Filter-Agent, Adv-Train-Agent) under semantic perturbation, prompt injection, drug-name confusion, and false-evidence attacks. The research is published on arXiv (2605.08257).

Key facts

ARSM-Agent is a full-link security enhancement framework for medical AI agents.
Framework includes six stages: input risk perception, medical evidence constraint, knowledge consistency verification, decision confidence reweighting, security output control, and adversarial feedback update.
Weighted joint objective: decision accuracy loss (0.3), adversarial robustness loss (0.3), safety refusal loss (0.2), knowledge consistency loss (0.2).
Outperforms LLM-Agent, Retrieval-Agent, Filter-Agent, and Adv-Train-Agent.
Tested under semantic perturbation, prompt injection, drug-name confusion, and false-evidence attacks.
Published on arXiv with ID 2605.08257.

Security Enhancement Framework for Medical AI Agents

Key facts

Entities

Institutions

Sources