ARTFEED — Contemporary Art Intelligence

LLM Formalization Verified via Roundtrip Repair

ai-technology · 2026-04-30

A novel technique guarantees that LLM autoformalization stays true to natural language without relying on ground-truth annotations. This roundtrip verification method involves formalizing a statement, translating it back, re-formalizing, and then assessing logical equivalence through a formal tool. Agreement in formalizations indicates faithfulness, while discrepancies lead to a diagnostic phase that pinpoints the translation failure, followed by a specific repair operator for correction. Testing on 150 traffic rules with Claude Opus 4.6 and GPT-5.2 showed that diagnosis-guided repair improved formal equivalence from 45–61% to 83–85% for both models, surpassing a random-repair benchmark. An independent NLI analysis validated that formal equivalence is linked to reduced semantic drift. This research was submitted to arXiv (ID 2604.25031) in the Computer Science > Computation and Language category.

Key facts

  • Roundtrip verification does not require ground-truth annotations.
  • Approach: formalize, translate back, re-formalize, check logical equivalence.
  • Diagnosis step identifies which translation stage failed.
  • Targeted repair operator corrects the failed stage.
  • Evaluated on 150 traffic rules.
  • Models used: Claude Opus 4.6 and GPT-5.2.
  • Formal equivalence raised from 45–61% to 83–85%.
  • Random-repair baseline outperformed.
  • Independent NLI analysis confirms correlation with less semantic drift.
  • Submitted to arXiv under Computer Science > Computation and Language.

Entities

Institutions

  • arXiv

Sources