ARTFEED — Contemporary Art Intelligence

SeDT: Improving LLM Multi-Turn Reliability via Reinforcement Learning Conditioning

ai-technology · 2026-05-27

A recent study indicates that large language models (LLMs) can experience a performance decline of up to 39% when tasks are disclosed gradually over multiple interactions, a situation referred to as 'Lost in Conversation.' This drop in performance is mainly attributed to reliability issues: while the optimal capability decreases by only 16%, unreliability skyrockets by over 112%. The researchers suggest that the underlying issue is structural, as a flat conversation history treats each previous turn with equal importance, hindering the model's ability to identify essential constraints versus trivial dialogue. To remedy this, they introduce SeDT (Sentence-transformer Decision-Transformer), a method that requires no training and utilizes return-to-go conditioning from offline reinforcement learning. SeDT assigns a cumulative relevance score to each segment of conversation based on three elements: a sentence transformer for semantic relevance, a decision transformer for sequential choices, and a return-to-go mechanism to emphasize valuable turns. This approach can be implemented on any existing LLM without further training. The research is accessible on arXiv with ID 2605.26788.

Key facts

  • LLMs lose up to 39% performance in multi-turn tasks.
  • Best-case aptitude drops only 16%.
  • Unreliability more than doubles (+112%).
  • Root cause is flat conversation history with equal turn weighting.
  • SeDT uses return-to-go conditioning from offline reinforcement learning.
  • SeDT is training-free and inference-time only.
  • Method annotates conversation shards with cumulative relevance scores.
  • Paper available on arXiv: 2605.26788.

Entities

Institutions

  • arXiv

Sources