SeDT: Improving LLM Multi-Turn Reliability via Reinforcement Learning Conditioning

ai-technology · 2026-05-27

A recent study indicates that large language models (LLMs) can experience a performance decline of up to 39% when tasks are disclosed gradually over multiple interactions, a situation referred to as 'Lost in Conversation.' This drop in performance is mainly attributed to reliability issues: while the optimal capability decreases by only 16%, unreliability skyrockets by over 112%. The researchers suggest that the underlying issue is structural, as a flat conversation history treats each previous turn with equal importance, hindering the model's ability to identify essential constraints versus trivial dialogue. To remedy this, they introduce SeDT (Sentence-transformer Decision-Transformer), a method that requires no training and utilizes return-to-go conditioning from offline reinforcement learning. SeDT assigns a cumulative relevance score to each segment of conversation based on three elements: a sentence transformer for semantic relevance, a decision transformer for sequential choices, and a return-to-go mechanism to emphasize valuable turns. This approach can be implemented on any existing LLM without further training. The research is accessible on arXiv with ID 2605.26788.

Key facts

LLMs lose up to 39% performance in multi-turn tasks.
Best-case aptitude drops only 16%.
Unreliability more than doubles (+112%).
Root cause is flat conversation history with equal turn weighting.
SeDT uses return-to-go conditioning from offline reinforcement learning.
SeDT is training-free and inference-time only.
Method annotates conversation shards with cumulative relevance scores.
Paper available on arXiv: 2605.26788.

SeDT: Improving LLM Multi-Turn Reliability via Reinforcement Learning Conditioning

Key facts

Entities

Institutions

Sources