Temporal Drift in LLMs Found Geometrically Orthogonal to Correctness
A new study on arXiv (2605.09195) indicates that the outdated responses from large language models are due to structural problems instead of engineering mistakes. The researchers found that temporal drift, which shows changes in facts since the model was trained, exists as a direction in the residual stream that is geometrically separate from both correctness and uncertainty. This means that focusing only on correctness or uncertainty misses the drift aspect. They confirmed this across six instruction-tuned models. A linear probe trained on drift labels reached an AUROC of 0.83–0.95, while other methods, like token and semantic entropy, were around chance levels (0.49–0.57). Five tests supported the geometric separation.
Key facts
- Temporal drift is encoded as a direction orthogonal to correctness and uncertainty in LLM residual streams.
- A linear probe trained on drift labels achieves AUROC 0.83–0.95.
- Existing methods like token entropy, semantic entropy, CCS, and SAPLMA perform near chance (0.49–0.57).
- Five tests confirm geometric orthogonality: weight cosines ≤0.14, score correlations ≤0.20, null-space projections ≤0.008.
- The finding holds across six instruction-tuned models.
- No existing method can detect outdated answers from LLMs.
- The paper is published on arXiv with ID 2605.09195.
- Temporal drift is defined as whether a stored fact has changed since training.
Entities
Institutions
- arXiv