ARTFEED — Contemporary Art Intelligence

LLM Intent Fidelity Evaluation Framework Reveals Structural-Content Split

ai-technology · 2026-05-16

A novel assessment framework for large language models (LLMs) differentiates between the reproduction of structural forms and the preservation of specific intents. This research analyzed 2,880 outputs across three languages, three task domains, and six LLMs, employing structured prompt ablation to evaluate both structural recovery and intent fidelity across various semantic dimensions. Findings reveal a consistent divide between structural fidelity and intent: 25.7% of Chinese outputs achieving perfect holistic alignment scores (GA=5) showed intent deficits, which increased to 58.6% for English outputs. Human assessments validated that these outputs in the split-zone indicate real quality issues and that dimensional fidelity scores align with human evaluations.

Key facts

  • Proposes dimension-level intent fidelity evaluation framework
  • Applied structured prompt ablation across 2,880 outputs
  • Covered three languages, three task domains, six LLMs
  • Measures structural recovery and intent fidelity separately
  • 25.7% of Chinese outputs with GA=5 had intent deficits
  • 58.6% of English outputs with GA=5 had intent deficits
  • Human evaluation confirmed split-zone outputs are genuine deficits
  • Dimensional fidelity scores align with human judgments

Entities

Sources