ARTFEED — Contemporary Art Intelligence

ScaleLogic: RL Training Compute Scales as Power Law with Reasoning Depth in LLMs

ai-technology · 2026-05-09

A new synthetic framework called ScaleLogic enables controlled study of how reinforcement learning (RL) training compute scales with reasoning difficulty in large language models (LLMs). The framework independently controls two axes: proof planning depth (horizon) and logical expressiveness, ranging from simple implication to first-order logic with conjunction, disjunction, negation, and universal quantification. Experiments reveal that RL training compute T follows a power law with respect to reasoning depth D (T ∝ D^γ, R² > 0.99), and the scaling exponent γ increases monotonically with logical expressiveness. The work, published on arXiv (2605.06638), provides a systematic approach to understanding RL-based reasoning improvements in LLMs.

Key facts

  • ScaleLogic is a synthetic logical reasoning framework for LLMs.
  • It controls two difficulty axes: proof planning depth and logical expressiveness.
  • Supported logics include implication-only, conjunction, disjunction, negation, and universal quantification.
  • RL training compute T scales as T ∝ D^γ with R² > 0.99.
  • Scaling exponent γ increases with logical expressiveness.
  • The paper is available on arXiv as 2605.06638.
  • The framework addresses the lack of controlled environments for studying RL training scaling.
  • The study systematically analyzes how training scales with task difficulty.

Entities

Institutions

  • arXiv

Sources