Statistical Preemption in LLMs: Causal Evidence from Verb-Construction Pairings
A recent computational investigation published on arXiv (2605.23039) offers causal insights indicating that large language models (LLMs) learn about disallowed linguistic structures via statistical preemption, a concept from Construction Grammar. This study effectively distinguishes statistical preemption from the alternative entrenchment hypothesis using a unified design across four experiments involving 120 English verb-construction combinations (dative, causative, locative). Findings reveal a strong correlation (r = 0.79) between LLM surprisal patterns and human acceptability ratings, corroborated by three separate behavioral datasets. The results emphasize that the frequency of competing forms, rather than the general frequency of verbs, drives these patterns. This research marks the first computational evidence for statistical preemption in LLMs, shedding light on how both models and possibly humans grasp linguistic constraints without negative evidence.
Key facts
- Study published on arXiv under ID 2605.23039
- Investigates statistical preemption in large language models
- Four experiments with 120 English verb-construction pairings
- LLM surprisal correlates with human judgments at r = 0.79
- Validated against three independent behavioral datasets
- Competing-form frequency drives patterns, not overall verb frequency
- Non-circular partial correlations confirm preemption sensitivity
- First direct computational dissociation of preemption from entrenchment
Entities
Institutions
- arXiv