LLM Framework Generates Long-Term Medical Dialogue Dataset

ai-technology · 2026-05-20

A new framework uses large language models to synthesize high-quality, long-term medical dialogues, addressing the lack of datasets for evaluating healthcare agents' memory. The approach constructs synthetic patient profiles with diverse disease trajectories, generates multi-turn dialogues per encounter, and integrates them into a coherent longitudinal dataset called MediLongChat. Three benchmark tasks—In-dialogue Reasoning, Cross-dialogue Reasoning, and Synthesis Reasoning—are established to assess memory capabilities. The work is presented in a paper on arXiv (2605.19766v1).

Key facts

Framework synthesizes long-term medical dialogues using LLMs
Addresses absence of datasets with realistic longitudinal timelines
Three-stage approach: patient profiles, multi-turn dialogues, integration
MediLongChat dataset created
Three benchmark tasks for memory evaluation
Paper on arXiv: 2605.19766v1
Real clinical text constrained by privacy and ethics
Existing benchmarks fail to capture cross-session reasoning

LLM Framework Generates Long-Term Medical Dialogue Dataset

Key facts

Entities

Institutions

Sources