ARTFEED — Contemporary Art Intelligence

RLAAR: Curriculum RL Reduces Lost-in-Conversation in LLMs

other · 2026-04-25

A new framework called RLAAR (Reinforcement Learning with Verifiable Accuracy and Abstention Rewards) addresses the Lost-in-Conversation (LiC) problem in large language models, where performance degrades as information is revealed across multiple turns. The approach uses a competence-gated curriculum that gradually increases dialogue difficulty, combined with a mixed-reward system that encourages correct answers and informed abstention when questions are unsolvable. RLAAR employs multi-turn on-policy rollouts to train models to balance problem-solving with abstention, reducing premature answering. The work is motivated by progress in Reinforcement Learning with Verifiable Rewards (RLVR) and aims to improve reliability in multi-turn conversations.

Key facts

  • RLAAR stands for Reinforcement Learning with Verifiable Accuracy and Abstention Rewards.
  • It addresses Lost-in-Conversation (LiC) in large language models.
  • The framework uses a competence-gated curriculum that incrementally increases dialogue difficulty.
  • It employs a mixed-reward system for correct answers and abstention.
  • Multi-turn on-policy rollouts are used for training.
  • The goal is to reduce premature answering behavior.
  • The work is motivated by progress in RLVR.
  • The paper is available on arXiv with ID 2510.18731.

Entities

Sources