DenoiseRL: AI Reasoning Model Recovers from Noisy Prefixes

ai-technology · 2026-05-28

A new reinforcement learning framework called DenoiseRL enables large language models to improve reasoning by learning directly from their own incorrect traces, eliminating the need for stronger teacher models or curated difficult datasets. The method converts failures into training opportunities, enhancing exploration efficiency and reducing dependence on external supervision and expensive data curation. This approach makes scalable capability improvement more accessible by using recovery-oriented optimization over errors from weak models.

Key facts

DenoiseRL is a reinforcement learning framework for large language models.
It substitutes external supervision with recovery-oriented optimization over failures.
The method learns from incorrect reasoning traces from weak models.
It improves reasoning performance and training efficiency.
It reduces the need for expensive data curation or stronger teacher models.
The approach yields richer and more diverse learning signals.
It enhances exploration efficiency from imperfect model behavior.
The paper is available on arXiv under ID 2605.28421.

DenoiseRL: AI Reasoning Model Recovers from Noisy Prefixes

Key facts

Entities

Institutions

Sources