New FMECA Framework Evaluates Patient Safety Risks in LLM-Generated Clinical Summaries

ai-technology · 2026-05-07

A recent study available on arXiv (2605.04085) presents an innovative Failure Mode, Effects, and Criticality Analysis (FMECA) framework designed to evaluate patient safety risks associated with clinical texts produced by large language models (LLMs). An expert panel comprising eight interdisciplinary members created a taxonomy of failure modes through a combination of literature review and brainstorming sessions, modifying traditional FMECA dimensions—occurrence, severity, detectability—into 5-point ordinal scales. This framework was tested on 36 discharge summaries from four patients, generated by the open LLM GPT-OSS 120B using actual clinical data from Geneva University Hospitals. The study seeks to fill the void in structured risk assessment techniques for LLM-generated clinical content, with the goal of identifying and addressing potential risks prior to implementation in healthcare environments.

Key facts

Study published on arXiv with ID 2605.04085
Developed a FMECA framework for LLM-generated clinical summaries
Interdisciplinary expert panel of 8 members
Failure modes identified via literature review and brainstorming
FMECA dimensions adapted to 5-point ordinal scales
Applied to 36 discharge summaries from 4 patients
LLM used: GPT-OSS 120B
Clinical data sourced from Geneva University Hospitals

Entities

Institutions

arXiv
Geneva University Hospitals

Locations

Geneva
Switzerland

Sources

arXiv cs.AI — 2026-05-07