ARTFEED — Contemporary Art Intelligence

LGMT: Logic-Grounded Metamorphic Testing for LLM Reasoning Reliability

ai-technology · 2026-05-26

Researchers have introduced a novel framework named LGMT (Logic-Grounded Metamorphic Testing) aimed at assessing the reasoning reliability of Large Language Models (LLMs). This framework, detailed in a publication on arXiv (2605.23965), employs first-order logic (FOL) to generate metamorphic relations derived from formal logical equivalences, resulting in semantically invariant test cases. Unlike conventional static benchmarks, LGMT identifies reasoning flaws via cross-case consistency checks, eliminating the need for ground-truth labels. Testing on six leading LLMs uncovered significant hidden defects overlooked by reference-based assessments. The findings indicate that these models are especially vulnerable to variations at the symbol and conclusion levels, with advanced prompting strategies like Few-shot CoT only partially alleviating these challenges.

Key facts

  • LGMT stands for Logic-Grounded Metamorphic Testing.
  • It is an oracle-free framework for evaluating LLM reasoning.
  • LGMT leverages first-order logic (FOL) to derive metamorphic relations.
  • It constructs semantically invariant test cases from logical equivalences.
  • Defects are detected through cross-case consistency checking.
  • Experiments were conducted on six state-of-the-art LLMs.
  • LGMT exposed hidden defects missed by traditional evaluations.
  • Models are sensitive to symbol-level and conclusion-level variations.

Entities

Institutions

  • arXiv

Sources