MedMemoryBench: Benchmarking AI Memory for Personalized Healthcare Agents

ai-technology · 2026-05-13

MedMemoryBench is an innovative benchmark aimed at assessing memory functions in AI agents tailored for personalized healthcare. It fills a void in current benchmarks that primarily emphasize open-domain dialogues instead of critical medical scenarios. Originating from the needs of a top-tier health management agent that caters to millions of users, MedMemoryBench employs a human-AI collaborative approach to generate authentic, long-term medical pathways using clinically relevant synthetic patient models. The dataset comprises around 2,000 sessions and 16,000 interaction turns. Additionally, it features a streaming assessment protocol that evaluates memory in real-time as the trajectory is developed, moving away from conventional static evaluation methods.

Key facts

MedMemoryBench benchmarks agent memory in personalized healthcare
Existing benchmarks focus on daily open-domain conversations
Motivated by production requirements of a health management agent with tens of millions of users
Uses human-AI collaborative pipeline to synthesize medical trajectories
Based on clinically grounded synthetic patient archetypes
Dataset includes approximately 2,000 sessions and 16,000 interaction turns
Introduces 'evaluate-while-constructing' streaming assessment protocol
Published on arXiv with ID 2605.11814

MedMemoryBench: Benchmarking AI Memory for Personalized Healthcare Agents

Key facts

Entities

Institutions

Sources