Reinforcement Learning Enhances FHIR Tool-Calling Agents

ai-technology · 2026-05-16

A new study from arXiv (2605.14126) addresses the challenge of using LLM agents for reasoning over Fast Healthcare Interoperability Resources (FHIR), the dominant standard for healthcare data exchange. FHIR structures electronic health records as directed graphs, requiring agents to perform multi-step reasoning, filtering, and aggregation. Prior tool-augmented LLM agents often select wrong resources or violate traversal constraints. The researchers frame FHIR reasoning as a sequential decision-making problem over a queryable structured graph, using the FHIR-AgentBench benchmark with real-world hospital data. They implement a multi-turn CodeAct agent post-trained with reinforcement learning via a custom harness and tools, with an LLM Judge providing execution-grounded rewards. This approach outperforms prompt-based methods.

Key facts

FHIR is the dominant standard for interoperable healthcare data exchange.
FHIR represents electronic health records as a directed graph of resources.
Answering clinical questions requires multi-step reasoning across resource types.
Prior tool-augmented LLM agents often select wrong resources or violate constraints.
The study uses FHIR-AgentBench, a benchmark with real-world hospital data.
The approach frames FHIR reasoning as a sequential decision-making problem.
A multi-turn CodeAct agent is post-trained with reinforcement learning.
An LLM Judge provides execution-grounded rewards.

Reinforcement Learning Enhances FHIR Tool-Calling Agents

Key facts

Entities

Institutions

Sources