s-Trace Method Reveals Two-Phase Computation in LLMs

ai-technology · 2026-05-27

A new method called s-Trace efficiently estimates the minimal subgraph of a transformer-based large language model (LLM) that best approximates the full model output. Applying s-Trace to various LLMs reveals that computation is organized in two distinct phases: an early phase where a small subgraph, mostly from early layers, reconstructs the head of the output distribution, and a later phase where additional nodes, primarily attention heads in later layers, provide incremental refinements. The amount of computation needed per input correlates with model uncertainty, and sparser subgraphs encode shallow statistics like unigram frequency. The findings suggest that LLMs do not exploit their full capacity for all inputs.

Key facts

s-Trace method estimates the minimal subgraph of size s that best approximates full model output.
Computation in LLMs is organized in two distinct phases.
Early-phase subgraph consists mostly of early-layer nodes and reconstructs the head of the output distribution.
Later-phase adds nodes mostly in later layers, increasingly attention heads, for incremental refinements.
Amount of necessary computation per input correlates with model uncertainty.
Sparser subgraphs encode shallow statistics such as unigram frequency.
Study was published on arXiv with identifier 2605.27033.
Findings suggest LLMs do not exploit full capacity for all inputs.

s-Trace Method Reveals Two-Phase Computation in LLMs

Key facts

Entities

Institutions

Sources