ARTFEED — Contemporary Art Intelligence

MedFabric and EtHER: A Data-Centric Framework for Word-Level Fabrication Generation and Detection in Medical LLMs

ai-technology · 2026-05-07

A new data-centric pipeline generates realistic, word-level fabrications for medical LLMs, preserving syntactic and stylistic fidelity while introducing subtle factual deviations. The resulting dataset, MedFabric, addresses limitations in existing medical hallucination datasets, which inadequately capture fabrication phenomena due to limited coverage, stylistic disparities, and distributional drift. Building on MedFabric, the modular detector EtHER integrates Text2Table Decomposition, Word Masking and Filling, and Hybrid Sentence Pair Evaluation to enhance factual accuracy. The framework targets the high-risk problem of LLMs generating factually incorrect yet fluent statements in medical contexts.

Key facts

  • arXiv:2605.04180v1
  • MedFabric is a dataset for word-level fabrication generation.
  • EtHER is a modular word-level fabrication detector.
  • Pipeline preserves syntactic and stylistic fidelity.
  • Existing medical hallucination datasets have limited fabrication coverage.
  • Fabrications are factually incorrect yet fluent statements.
  • Medical contexts pose the greatest risk for fabrications.
  • EtHER uses Text2Table Decomposition, Word Masking and Filling, and Hybrid Sentence Pair Evaluation.

Entities

Institutions

  • arXiv

Sources