Semantic Role Understanding Emerges During LLM Pre-Training

ai-technology · 2026-05-12

A recent study published on arXiv (2605.09187) examines if the comprehension of semantic roles—the capability to discern 'who did what to whom' in sentences—develops during the pre-training of language models or necessitates specific fine-tuning for tasks. The researchers fixed decoder-only transformers and employed linear probes to extract semantic roles, assessing performance to determine if this information is embedded in pre-training or acquired through adaptation. The findings revealed that frozen representations across various model scales contained significant semantic role information, showing improvement in performance, though not entirely reaching the levels of fine-tuned models. This suggests that while some emergence occurs during pre-training, it remains incomplete.

Key facts

Study examines semantic role understanding in language models
Uses frozen decoder-only transformers with linear probes
Finds substantial role information in pre-trained representations
Performance improves with fine-tuning but does not fully match
Indicates partial emergence from pre-training
Published on arXiv with ID 2605.09187
Focuses on 'who did what to whom' meaning representation
Across model scales, frozen representations encode role info

Semantic Role Understanding Emerges During LLM Pre-Training

Key facts

Entities

Institutions

Sources