Innovation as an Almost Characterization of LLM Hallucination
A recent study published on arXiv (2605.26808) presents "innovation" as a characteristic of large language models, indicating their likelihood to generate outputs beyond the scope of their training data. The researchers demonstrate that this innovation aligns with the hallucination criteria established by Kalai and Vempala (STOC 2024), suggesting it serves as an almost complete characterization of hallucination. The paper explores two key inquiries: what aspect renders hallucinations inevitable in calibrated LLMs, and if abandoning calibration can prevent hallucinations. This research expands upon the probabilistic framework laid out by Kalai and Vempala, which defined calibration and hallucination, revealing that calibrated LLMs experience hallucinations at a rate corresponding to the "missing mass."
Key facts
- Paper on arXiv: 2605.26808
- Introduces property called 'innovation'
- Innovation measures tendency to produce outputs outside training data
- Innovation is implied by Kalai and Vempala's hallucination condition
- Innovation is an almost characterization of hallucination
- Addresses two fundamental questions about LLM hallucination
- Builds on Kalai and Vempala (STOC 2024) framework
- Kalai and Vempala showed calibrated LLMs hallucinate at rate of 'missing mass'
Entities
Institutions
- arXiv