ARTFEED — Contemporary Art Intelligence

ProFIL: Probe-Filtered RL Reduces Reasoning Theater in LLMs

ai-technology · 2026-05-13

Researchers have introduced ProFIL (Probe-Filtered Reinforcement Learning), a method to reduce 'reasoning theater' in large language models. Reasoning theater refers to post-hoc rationalizations that appear deliberative but contribute nothing to correctness, wasting tokens and obscuring interpretability. ProFIL extends Group Relative Policy Optimization (GRPO) by training a multi-head attention probe once on a frozen base model to detect post-commitment steps from internal activations. During GRPO, rollouts exceeding a probe threshold have their advantage zeroed, suppressing theater while maintaining faithfulness. The probe uses verifier-derived labels without human annotation. The approach aims to reduce chain-of-thought length and increase faithfulness in a single, drop-in extension.

Key facts

  • ProFIL stands for Probe-Filtered Reinforcement Learning.
  • It targets 'reasoning theater' in chain-of-thought reasoning.
  • A multi-head attention probe is trained once on a frozen base model.
  • The probe detects post-commitment steps from internal activations.
  • Rollouts exceeding a probe threshold have their advantage zeroed during GRPO.
  • Verifier-derived labels are used without human annotation.
  • The method reduces chain length and increases faithfulness.
  • It is a drop-in extension to Group Relative Policy Optimization (GRPO).

Entities

Institutions

  • arXiv

Sources