Runtime Verifier Detects LLM Context Manipulation in Conversations

ai-technology · 2026-05-16

A novel runtime verifier designed for large language model (LLM) dialogues utilizes an explicit dependency graph to identify context-manipulation attacks. This system categorizes each interaction into one of eight update operations based on dynamic epistemic logic, abductive reasoning, awareness logic, and argumentation. A symbolic engine tracks the relationship between claims and evidence, simplifying the process of verifying support for continuations to a graph traversal. Retractions are communicated through the same graph, marking unsupported conclusions, with a linear cost per turn and a formal guarantee of conflict-free results. On the LongMemEval-KU oracle (n=78), the verifier achieves an accuracy of 89.7%, surpassing both an LLM-only baseline (88.5%) and a transcript-RAG baseline (87.2%). The paper can be found on arXiv with ID 2605.14175.

Key facts

The verifier maintains an explicit dependency graph for LLM conversations.
Each turn is classified into one of eight update operations from four formalisms.
The system uses dynamic epistemic logic, abductive reasoning, awareness logic, and argumentation.
A symbolic engine records dependencies between claims and evidence.
Support checking reduces to a graph walk.
Retraction propagates through the graph to flag unsupported conclusions.
The verifier has linear per-turn cost and a formal conflict-free guarantee.
On LongMemEval-KU oracle (n=78), accuracy is 89.7% vs. 88.5% for LLM-only baseline.
The transcript-RAG baseline achieved 87.2% accuracy.
The paper is published on arXiv with ID 2605.14175.

Runtime Verifier Detects LLM Context Manipulation in Conversations

Key facts

Entities

Institutions

Sources