LLMs Tested on Semantic Generalization with Phrasal Constructions
A new evaluation dataset leverages Construction Grammar (CxG) to test whether large language models (LLMs) can generalize beyond memorization to understand novel phrasal constructions. The dataset, derived from English phrasal constructions, assesses if models grasp abstract meanings tied to syntactic forms, mirroring human ability to interpret creative instantiations. The study addresses the challenge of disentangling linguistic competence on well-represented pretraining data from out-of-domain generalization. The arXiv preprint (2501.04661) introduces a diagnostic evaluation for natural language understanding, focusing on semantic generalization in LLMs.
Key facts
- arXiv:2501.04661v3
- Announce Type: replace-cross
- Uses Construction Grammar (CxG) framework
- Evaluates semantic generalization in LLMs
- Dataset consists of English phrasal constructions
- Tests understanding of abstract, non-lexical meanings
- Focuses on out-of-domain language generalization
- Compares model performance to human speaker abilities
Entities
Institutions
- arXiv