QAOD: Single-Pass Hallucination Detection for LLMs via Orthogonal Decomposition
A new framework called QAOD (Question-Answer Orthogonal Decomposition) has been introduced by researchers to identify hallucinations in large language models (LLMs) through a single-pass approach. This method extracts a question-orthogonal component by projecting the question-aligned direction away from the answer representation, thereby minimizing domain-conditioned variations. Layer selection is achieved using diversity-penalized Fisher scoring, while discriminative neurons are identified based on Fisher importance. Two probing strategies are implemented: one combines the orthogonal component with the question context to create a joint probe. The goal of this approach is to enhance accuracy, efficiency, and resilience to distribution shifts, facilitating both in-domain detection and cross-domain generalization. The research can be found on arXiv (2605.14449).
Key facts
- QAOD stands for Question-Answer Orthogonal Decomposition.
- It is a single-pass framework for hallucination detection in LLMs.
- The method projects away the question-aligned direction from the answer representation.
- It uses diversity-penalized Fisher scoring for layer selection.
- It uses Fisher importance for neuron selection.
- Two complementary probing strategies are designed.
- The approach addresses in-domain detection and cross-domain generalization.
- The paper is available on arXiv with ID 2605.14449.
Entities
Institutions
- arXiv