LLMs Struggle When Instructions Conflict with Pattern Completion
A recent study published on arXiv (2605.20382) indicates that large language models (LLMs) encounter a core dilemma between adhering to instructions and completing patterns. Researchers designed dialogues where a user's request for a specific behavior T (such as generating a certain token, responding in a designated language, or embodying a character) conflicts with N pre-programmed assistant responses that exhibit an alternative pattern P. In tests involving 13 models and 16 distinct instructions over 50 turns, the average rates of following instructions ranged significantly from 1% to 99%, showing little correlation with traditional capability metrics. The shift from following instructions to adhering to patterns is consistent across models but varies greatly. The study underscores a significant weakness in existing LLM alignment strategies.
Key facts
- Study from arXiv:2605.20382
- Tests instruction-induction conflict in LLMs
- Constructs conversations with opposing instruction T and pattern P
- 13 models tested
- 16 different instructions
- Up to 50 turns per test
- Instruction-following rates range from 1% to 99%
- Rates uncorrelated with standard capability benchmarks
- Transition from instruction-following to pattern-following is universal but model-dependent
- Robustness modulated by instruction content and output format
Entities
Institutions
- arXiv