LLMs Tested on Semantic Generalization with Phrasal Constructions

ai-technology · 2026-06-01

A new evaluation dataset leverages Construction Grammar (CxG) to test whether large language models (LLMs) can generalize beyond memorization to understand novel phrasal constructions. The dataset, derived from English phrasal constructions, assesses if models grasp abstract meanings tied to syntactic forms, mirroring human ability to interpret creative instantiations. The study addresses the challenge of disentangling linguistic competence on well-represented pretraining data from out-of-domain generalization. The arXiv preprint (2501.04661) introduces a diagnostic evaluation for natural language understanding, focusing on semantic generalization in LLMs.

Key facts

arXiv:2501.04661v3
Announce Type: replace-cross
Uses Construction Grammar (CxG) framework
Evaluates semantic generalization in LLMs
Dataset consists of English phrasal constructions
Tests understanding of abstract, non-lexical meanings
Focuses on out-of-domain language generalization
Compares model performance to human speaker abilities

LLMs Tested on Semantic Generalization with Phrasal Constructions

Key facts

Entities

Institutions

Sources