Hierarchical Language Models Show Predictable Scaling and Reasoning Benefits
A new arXiv paper (2605.13687) introduces synthetic languages with hierarchical structure generated by a broadcast process on trees, enabling precise analysis of context length and reasoning in autoregressive generation. The authors propose an exact k-gram ansatz as a substitute for transformers with context length k, validated empirically. For the Ising broadcast process, they prove the variance of generated sums scales log-linearly with context depth and kurtosis converges to Gaussian, deviating from the true language for sublinear context. For the coloring broadcast process in the freezing regime, bounded-context models also show predictable deviations.
Key facts
- Paper introduces synthetic languages with hierarchical structure via broadcast process on trees
- Exact k-gram ansatz substitutes for transformers with context length k
- Ising broadcast process: variance scales log-linearly, kurtosis converges to Gaussian
- Coloring broadcast process analyzed in freezing regime
- Predictable scaling laws for distributional statistics
- Empirical validation of the ansatz
- Provable benefits of reasoning in autoregressive generation
- arXiv preprint 2605.13687
Entities
Institutions
- arXiv