ARTFEED — Contemporary Art Intelligence

CLEAR Framework Reveals LLM Reliability Issues in Medical Contexts

other · 2026-05-06

A new framework called CLEAR (CLinical Evaluation of Ambiguity and Reliability) exposes how noise and ambiguity degrade large language model (LLM) performance in medical benchmarks. Developed by researchers and published on arXiv, CLEAR systematically perturbs answer option count, ground truth presence, and semantic framing across three benchmarks and 17 LLMs. Results show that increasing plausible answers reduces accuracy and abstention ability, especially when abstention framing shifts from assertive rejection to uncertain phrasing. The study highlights limitations in current evaluation methods that fail to reflect real-world medical ambiguity.

Key facts

  • CLEAR framework introduced to assess LLM reliability under ambiguity
  • Evaluated on three benchmarks across 17 LLMs
  • Increasing plausible answers degrades correct answer identification
  • Abstention framing affects model caution
  • Published on arXiv with ID 2605.01011

Entities

Institutions

  • arXiv

Sources