RLAAR: Curriculum RL Reduces Lost-in-Conversation in LLMs

other · 2026-04-25

A new framework called RLAAR (Reinforcement Learning with Verifiable Accuracy and Abstention Rewards) addresses the Lost-in-Conversation (LiC) problem in large language models, where performance degrades as information is revealed across multiple turns. The approach uses a competence-gated curriculum that gradually increases dialogue difficulty, combined with a mixed-reward system that encourages correct answers and informed abstention when questions are unsolvable. RLAAR employs multi-turn on-policy rollouts to train models to balance problem-solving with abstention, reducing premature answering. The work is motivated by progress in Reinforcement Learning with Verifiable Rewards (RLVR) and aims to improve reliability in multi-turn conversations.

Key facts

RLAAR stands for Reinforcement Learning with Verifiable Accuracy and Abstention Rewards.
It addresses Lost-in-Conversation (LiC) in large language models.
The framework uses a competence-gated curriculum that incrementally increases dialogue difficulty.
It employs a mixed-reward system for correct answers and abstention.
Multi-turn on-policy rollouts are used for training.
The goal is to reduce premature answering behavior.
The work is motivated by progress in RLVR.
The paper is available on arXiv with ID 2510.18731.

Entities

—

Sources

arXiv cs.AI — 2026-04-25