ScaleLogic: RL Training Compute Scales as Power Law with Reasoning Depth in LLMs
A new synthetic framework called ScaleLogic enables controlled study of how reinforcement learning (RL) training compute scales with reasoning difficulty in large language models (LLMs). The framework independently controls two axes: proof planning depth (horizon) and logical expressiveness, ranging from simple implication to first-order logic with conjunction, disjunction, negation, and universal quantification. Experiments reveal that RL training compute T follows a power law with respect to reasoning depth D (T ∝ D^γ, R² > 0.99), and the scaling exponent γ increases monotonically with logical expressiveness. The work, published on arXiv (2605.06638), provides a systematic approach to understanding RL-based reasoning improvements in LLMs.
Key facts
- ScaleLogic is a synthetic logical reasoning framework for LLMs.
- It controls two difficulty axes: proof planning depth and logical expressiveness.
- Supported logics include implication-only, conjunction, disjunction, negation, and universal quantification.
- RL training compute T scales as T ∝ D^γ with R² > 0.99.
- Scaling exponent γ increases with logical expressiveness.
- The paper is available on arXiv as 2605.06638.
- The framework addresses the lack of controlled environments for studying RL training scaling.
- The study systematically analyzes how training scales with task difficulty.
Entities
Institutions
- arXiv