Transformers Show Scaling Deductive Reasoning with Depth
A new study on arXiv investigates how Transformer models scale implicit deductive reasoning over Horn clauses. Researchers found that sufficiently deep models with bidirectional prefix masks can approach the performance of explicit chain-of-thought reasoning across various graph topologies and problem widths, though chain-of-thought remains necessary for depth extrapolation. The work systematically decorrelates provability from spurious features and enforces algorithmic alignment.
Key facts
- Study investigates scaling properties of implicit deductive reasoning in Transformers
- Focuses on reasoning over Horn clauses in depth-bounded Transformers
- Systematically decorrelates provability from spurious features
- Enforces algorithmic alignment
- Sufficiently deep models with bidirectional prefix mask approach explicit CoT performance
- CoT remains necessary for depth extrapolation
- Results hold across graph topologies and problem widths
- Published on arXiv under Computer Science > Artificial Intelligence
Entities
Institutions
- arXiv