HealthCraft: RL Safety Environment for Emergency Medicine

ai-technology · 2026-05-23

A new reinforcement-learning environment named HealthCraft has been introduced for assessing advanced language models in the field of emergency medicine. This marks the first publicly available RL environment that incentivizes trajectory-level safety in realistic scenarios, derived from Corecraft. It operates on a FHIR R4 world state featuring 14 entity types and 3,987 seed entities, showcasing 24 MCP tools and employing a dual-layer rubric that nullifies rewards when safety-critical standards are breached. The launch includes 195 tasks divided into six categories, evaluated against 2,255 binary criteria (with 515 being safety-critical), and a post-hoc 10-task negative-class slate that expands to 205 tasks and 2,337 criteria. V8 outcomes for two frontier models indicate Claude Opus 4.6 at an unspecified performance level.

Key facts

HealthCraft is the first public RL environment for trajectory-level safety in emergency medicine
Adapted from Corecraft
Built on FHIR R4 world state with 14 entity types and 3,987 seed entities
Exposes 24 MCP tools
Dual-layer rubric zeroes reward when safety-critical criteria violated
195 tasks across six categories, graded against 2,255 binary criteria (515 safety-critical)
Post-hoc 10-task negative-class slate extends to 205 tasks and 2,337 criteria
V8 results on two frontier models show Claude Opus 4.6

HealthCraft: RL Safety Environment for Emergency Medicine

Key facts

Entities

Institutions

Sources