ProFIL: Probe-Filtered RL Reduces Reasoning Theater in LLMs

ai-technology · 2026-05-13

Researchers have introduced ProFIL (Probe-Filtered Reinforcement Learning), a method to reduce 'reasoning theater' in large language models. Reasoning theater refers to post-hoc rationalizations that appear deliberative but contribute nothing to correctness, wasting tokens and obscuring interpretability. ProFIL extends Group Relative Policy Optimization (GRPO) by training a multi-head attention probe once on a frozen base model to detect post-commitment steps from internal activations. During GRPO, rollouts exceeding a probe threshold have their advantage zeroed, suppressing theater while maintaining faithfulness. The probe uses verifier-derived labels without human annotation. The approach aims to reduce chain-of-thought length and increase faithfulness in a single, drop-in extension.

Key facts

ProFIL stands for Probe-Filtered Reinforcement Learning.
It targets 'reasoning theater' in chain-of-thought reasoning.
A multi-head attention probe is trained once on a frozen base model.
The probe detects post-commitment steps from internal activations.
Rollouts exceeding a probe threshold have their advantage zeroed during GRPO.
Verifier-derived labels are used without human annotation.
The method reduces chain length and increases faithfulness.
It is a drop-in extension to Group Relative Policy Optimization (GRPO).

ProFIL: Probe-Filtered RL Reduces Reasoning Theater in LLMs

Key facts

Entities

Institutions

Sources