ARTFEED — Contemporary Art Intelligence

HealthCraft: RL Safety Environment for Emergency Medicine

ai-technology · 2026-05-23

A new reinforcement-learning environment named HealthCraft has been introduced for assessing advanced language models in the field of emergency medicine. This marks the first publicly available RL environment that incentivizes trajectory-level safety in realistic scenarios, derived from Corecraft. It operates on a FHIR R4 world state featuring 14 entity types and 3,987 seed entities, showcasing 24 MCP tools and employing a dual-layer rubric that nullifies rewards when safety-critical standards are breached. The launch includes 195 tasks divided into six categories, evaluated against 2,255 binary criteria (with 515 being safety-critical), and a post-hoc 10-task negative-class slate that expands to 205 tasks and 2,337 criteria. V8 outcomes for two frontier models indicate Claude Opus 4.6 at an unspecified performance level.

Key facts

  • HealthCraft is the first public RL environment for trajectory-level safety in emergency medicine
  • Adapted from Corecraft
  • Built on FHIR R4 world state with 14 entity types and 3,987 seed entities
  • Exposes 24 MCP tools
  • Dual-layer rubric zeroes reward when safety-critical criteria violated
  • 195 tasks across six categories, graded against 2,255 binary criteria (515 safety-critical)
  • Post-hoc 10-task negative-class slate extends to 205 tasks and 2,337 criteria
  • V8 results on two frontier models show Claude Opus 4.6

Entities

Institutions

  • arXiv
  • Corecraft

Sources