MedMemoryBench: Benchmarking AI Memory for Personalized Healthcare Agents
MedMemoryBench is an innovative benchmark aimed at assessing memory functions in AI agents tailored for personalized healthcare. It fills a void in current benchmarks that primarily emphasize open-domain dialogues instead of critical medical scenarios. Originating from the needs of a top-tier health management agent that caters to millions of users, MedMemoryBench employs a human-AI collaborative approach to generate authentic, long-term medical pathways using clinically relevant synthetic patient models. The dataset comprises around 2,000 sessions and 16,000 interaction turns. Additionally, it features a streaming assessment protocol that evaluates memory in real-time as the trajectory is developed, moving away from conventional static evaluation methods.
Key facts
- MedMemoryBench benchmarks agent memory in personalized healthcare
- Existing benchmarks focus on daily open-domain conversations
- Motivated by production requirements of a health management agent with tens of millions of users
- Uses human-AI collaborative pipeline to synthesize medical trajectories
- Based on clinically grounded synthetic patient archetypes
- Dataset includes approximately 2,000 sessions and 16,000 interaction turns
- Introduces 'evaluate-while-constructing' streaming assessment protocol
- Published on arXiv with ID 2605.11814
Entities
Institutions
- arXiv