LLM Framework Generates Long-Term Medical Dialogue Dataset
A new framework uses large language models to synthesize high-quality, long-term medical dialogues, addressing the lack of datasets for evaluating healthcare agents' memory. The approach constructs synthetic patient profiles with diverse disease trajectories, generates multi-turn dialogues per encounter, and integrates them into a coherent longitudinal dataset called MediLongChat. Three benchmark tasks—In-dialogue Reasoning, Cross-dialogue Reasoning, and Synthesis Reasoning—are established to assess memory capabilities. The work is presented in a paper on arXiv (2605.19766v1).
Key facts
- Framework synthesizes long-term medical dialogues using LLMs
- Addresses absence of datasets with realistic longitudinal timelines
- Three-stage approach: patient profiles, multi-turn dialogues, integration
- MediLongChat dataset created
- Three benchmark tasks for memory evaluation
- Paper on arXiv: 2605.19766v1
- Real clinical text constrained by privacy and ethics
- Existing benchmarks fail to capture cross-session reasoning
Entities
Institutions
- arXiv