Language Models Show Factual Generation-Verification Gap
A new study on arXiv (2605.27564) investigates the generation-verification gap (GV-gap) in language models, where models verify facts more reliably than they generate them. The research focuses on factual knowledge across three training phases—acquisition, continual learning, and updating—using four open-source model families at two scales each. Key findings include: verification is learned before generation, verification is more robust to continual learning, and factual updates can create a 'multi-verse' state where models simultaneously verify old and new answers as correct.
Key facts
- arXiv paper 2605.27564 examines the generation-verification gap in language models.
- The GV-gap refers to models verifying facts better than generating them.
- Study covers three training phases: acquisition, continual learning, and updating.
- Four open-source model families were tested at two scales each.
- Verification is consistently learned before generation.
- Verification is more robust to continual learning than generation.
- Factual updates can lead to a 'multi-verse' state with dual verification.
- The research distinguishes factual GV-gaps from computational and aesthetic counterparts.
Entities
Institutions
- arXiv