Dynamic Outlier Truncation Reduces Verbosity in Reasoning Models

ai-technology · 2026-05-16

Dynamic Outlier Truncation (DOT) is an innovative training approach designed to tackle the issue of excessive verbosity in extensive reasoning models. Researchers discovered a phenomenon termed 'length shift,' where models produce superfluous reasoning for straightforward queries during reinforcement learning. By selectively eliminating redundant tokens from the extreme tail of response lengths in completely accurate rollout groups, DOT maintains long-horizon reasoning for more intricate problems. This technique circumvents optimization conflicts that arise from direct length penalties. The research paper can be found on arXiv with the ID 2601.03969.

Key facts

DOT is a training-time intervention for reducing verbosity in reasoning models.
Length shift causes models to overthink trivial inputs.
DOT targets only the extreme tail of response lengths in correct rollout groups.
The method preserves long-horizon reasoning capabilities for complex problems.
Explicit length penalties introduce optimization conflicts.
The paper is on arXiv with ID 2601.03969.
Reinforcement learning with verifiable rewards drives performance gains.
DOT selectively suppresses redundant tokens.

Dynamic Outlier Truncation Reduces Verbosity in Reasoning Models

Key facts

Entities

Institutions

Sources