Statistical Preemption in LLMs: Causal Evidence from Verb-Construction Pairings

ai-technology · 2026-05-25

A recent computational investigation published on arXiv (2605.23039) offers causal insights indicating that large language models (LLMs) learn about disallowed linguistic structures via statistical preemption, a concept from Construction Grammar. This study effectively distinguishes statistical preemption from the alternative entrenchment hypothesis using a unified design across four experiments involving 120 English verb-construction combinations (dative, causative, locative). Findings reveal a strong correlation (r = 0.79) between LLM surprisal patterns and human acceptability ratings, corroborated by three separate behavioral datasets. The results emphasize that the frequency of competing forms, rather than the general frequency of verbs, drives these patterns. This research marks the first computational evidence for statistical preemption in LLMs, shedding light on how both models and possibly humans grasp linguistic constraints without negative evidence.

Key facts

Study published on arXiv under ID 2605.23039
Investigates statistical preemption in large language models
Four experiments with 120 English verb-construction pairings
LLM surprisal correlates with human judgments at r = 0.79
Validated against three independent behavioral datasets
Competing-form frequency drives patterns, not overall verb frequency
Non-circular partial correlations confirm preemption sensitivity
First direct computational dissociation of preemption from entrenchment

Statistical Preemption in LLMs: Causal Evidence from Verb-Construction Pairings

Key facts

Entities

Institutions

Sources