Runtime Verifier Detects LLM Context Manipulation in Conversations
A novel runtime verifier designed for large language model (LLM) dialogues utilizes an explicit dependency graph to identify context-manipulation attacks. This system categorizes each interaction into one of eight update operations based on dynamic epistemic logic, abductive reasoning, awareness logic, and argumentation. A symbolic engine tracks the relationship between claims and evidence, simplifying the process of verifying support for continuations to a graph traversal. Retractions are communicated through the same graph, marking unsupported conclusions, with a linear cost per turn and a formal guarantee of conflict-free results. On the LongMemEval-KU oracle (n=78), the verifier achieves an accuracy of 89.7%, surpassing both an LLM-only baseline (88.5%) and a transcript-RAG baseline (87.2%). The paper can be found on arXiv with ID 2605.14175.
Key facts
- The verifier maintains an explicit dependency graph for LLM conversations.
- Each turn is classified into one of eight update operations from four formalisms.
- The system uses dynamic epistemic logic, abductive reasoning, awareness logic, and argumentation.
- A symbolic engine records dependencies between claims and evidence.
- Support checking reduces to a graph walk.
- Retraction propagates through the graph to flag unsupported conclusions.
- The verifier has linear per-turn cost and a formal conflict-free guarantee.
- On LongMemEval-KU oracle (n=78), accuracy is 89.7% vs. 88.5% for LLM-only baseline.
- The transcript-RAG baseline achieved 87.2% accuracy.
- The paper is published on arXiv with ID 2605.14175.
Entities
Institutions
- arXiv