ScaleLogic: RL Training Compute Scales as Power Law with Reasoning Depth in LLMs

ai-technology · 2026-05-09

A new synthetic framework called ScaleLogic enables controlled study of how reinforcement learning (RL) training compute scales with reasoning difficulty in large language models (LLMs). The framework independently controls two axes: proof planning depth (horizon) and logical expressiveness, ranging from simple implication to first-order logic with conjunction, disjunction, negation, and universal quantification. Experiments reveal that RL training compute T follows a power law with respect to reasoning depth D (T ∝ D^γ, R² > 0.99), and the scaling exponent γ increases monotonically with logical expressiveness. The work, published on arXiv (2605.06638), provides a systematic approach to understanding RL-based reasoning improvements in LLMs.

Key facts

ScaleLogic is a synthetic logical reasoning framework for LLMs.
It controls two difficulty axes: proof planning depth and logical expressiveness.
Supported logics include implication-only, conjunction, disjunction, negation, and universal quantification.
RL training compute T scales as T ∝ D^γ with R² > 0.99.
Scaling exponent γ increases with logical expressiveness.
The paper is available on arXiv as 2605.06638.
The framework addresses the lack of controlled environments for studying RL training scaling.
The study systematically analyzes how training scales with task difficulty.

ScaleLogic: RL Training Compute Scales as Power Law with Reasoning Depth in LLMs

Key facts

Entities

Institutions

Sources