DriftBench: LLMs Violate Constraints Despite Recalling Them
A recent benchmark indicates that large language models often disregard initial constraints during multi-turn ideation, despite their ability to accurately articulate those constraints. DriftBench assesses adherence to these constraints over 2,146 runs, involving seven models from five different providers, under four interaction conditions, and utilizing 38 research briefs spanning 24 scientific fields. The knows-but-violates (KBV) rate varies between 8% and 99% among the models, highlighting a disconnect between what is recalled declaratively and actual behavioral compliance. While structured checkpointing somewhat lowers KBV rates, it fails to eliminate the disconnect entirely.
Key facts
- DriftBench is a benchmark for evaluating constraint adherence in multi-turn LLM-assisted scientific ideation.
- The study involved 2,146 scored benchmark runs.
- Seven models from five providers were tested, including two open-weight models.
- Four interaction conditions were evaluated.
- 38 research briefs from 24 scientific domains were used.
- Iterative pressure reliably increases structural complexity and often reduces adherence to original constraints.
- The knows-but-violates (KBV) rate ranges from 8% to 99% across models.
- Structured checkpointing partially reduces KBV rates but does not close the dissociation.
Entities
Institutions
- arXiv