Hierarchical Language Models Show Predictable Scaling and Reasoning Benefits

publication · 2026-05-14

A new arXiv paper (2605.13687) introduces synthetic languages with hierarchical structure generated by a broadcast process on trees, enabling precise analysis of context length and reasoning in autoregressive generation. The authors propose an exact k-gram ansatz as a substitute for transformers with context length k, validated empirically. For the Ising broadcast process, they prove the variance of generated sums scales log-linearly with context depth and kurtosis converges to Gaussian, deviating from the true language for sublinear context. For the coloring broadcast process in the freezing regime, bounded-context models also show predictable deviations.

Key facts

Paper introduces synthetic languages with hierarchical structure via broadcast process on trees
Exact k-gram ansatz substitutes for transformers with context length k
Ising broadcast process: variance scales log-linearly, kurtosis converges to Gaussian
Coloring broadcast process analyzed in freezing regime
Predictable scaling laws for distributional statistics
Empirical validation of the ansatz
Provable benefits of reasoning in autoregressive generation
arXiv preprint 2605.13687

Hierarchical Language Models Show Predictable Scaling and Reasoning Benefits

Key facts

Entities

Institutions

Sources