LLMs Trust User Claims Over Sensor Data in Authority Inversion Phenomenon
A recent study published on arXiv highlights a phenomenon known as Authority Inversion in large language models (LLMs) when integrating diverse inputs in widespread systems. The researchers discovered that LLMs tend to favor natural-language assertions over conflicting numerical sensor data due to the way authority is assigned based on format. This bias arises because numerical information does not align with the model's relevant answer pathways, allowing user claims to overshadow sensor data. To tackle this issue, the authors proposed a geometric framework for context integration, along with two new audit metrics: Context Integration Ratio (CIR) and Authority Alignment Index (AAI). They also introduced Geometric Authority Calibration (GAC) as an intervention during inference. These results highlight significant reliability issues for applications like autonomous systems and IoT, where physical sensing should take precedence.
Key facts
- LLMs exhibit Authority Inversion when sensor data and user claims conflict.
- Numerical sensor data fails to integrate into answer-relevant model directions.
- Natural-language claims dominate final decisions over sensor inputs.
- Two audit metrics introduced: Context Integration Ratio (CIR) and Authority Alignment Index (AAI).
- Geometric Authority Calibration (GAC) proposed as inference-time mitigation.
- Study published on arXiv with ID 2605.23938.
- Research focuses on LLM-mediated ubiquitous systems.
- Authority allocation is format-dependent and implicit in learned representations.
Entities
Institutions
- arXiv