LLMs Struggle When Instructions Conflict with Pattern Completion

ai-technology · 2026-05-22

A recent study published on arXiv (2605.20382) indicates that large language models (LLMs) encounter a core dilemma between adhering to instructions and completing patterns. Researchers designed dialogues where a user's request for a specific behavior T (such as generating a certain token, responding in a designated language, or embodying a character) conflicts with N pre-programmed assistant responses that exhibit an alternative pattern P. In tests involving 13 models and 16 distinct instructions over 50 turns, the average rates of following instructions ranged significantly from 1% to 99%, showing little correlation with traditional capability metrics. The shift from following instructions to adhering to patterns is consistent across models but varies greatly. The study underscores a significant weakness in existing LLM alignment strategies.

Key facts

Study from arXiv:2605.20382
Tests instruction-induction conflict in LLMs
Constructs conversations with opposing instruction T and pattern P
13 models tested
16 different instructions
Up to 50 turns per test
Instruction-following rates range from 1% to 99%
Rates uncorrelated with standard capability benchmarks
Transition from instruction-following to pattern-following is universal but model-dependent
Robustness modulated by instruction content and output format

LLMs Struggle When Instructions Conflict with Pattern Completion

Key facts

Entities

Institutions

Sources