AI peer review faces security and reliability concerns
A new paper on arXiv (2604.23593) analyzes the risks of integrating large language models into scientific peer review. As submission volumes outpace human referee capacity, LLMs are increasingly used for summarization, fact-checking, and triage. However, early deployments reveal severe failure modes: hidden prompt injections in manuscripts can manipulate LLM reviews toward unjustified positive judgments, while adversarial phrasing, authority and length biases, and hallucinated claims undermine reliability. The study provides a security- and reliability-centered analysis, questioning whether AI referees can be trusted in scholarly communication.
Key facts
- arXiv paper 2604.23593 analyzes AI in peer review
- Submission volumes exceed human referee capacity
- LLMs used for summarization, fact-checking, and triage
- Hidden prompt injections can manipulate LLM reviews
- Adversarial phrasing causes brittleness
- Authority and length biases observed
- Hallucinated claims are a failure mode
- Paper questions trust in AI referees
Entities
Institutions
- arXiv