CTO: Syntax-Guided and Semantic-Aware Code Translation via Preference Optimization

other · 2026-05-14

A recent study published on arXiv (2605.13229) presents CTO, a novel approach aimed at enhancing code translation by large language models (LLMs) through syntax-guided and semantic-aware preference optimization. The researchers contend that current preference-based learning often depends on unreliable semantic rewards derived from limited test cases or narrow reference translations. CTO employs contrastive learning to train a cross-lingual semantic model that evaluates functional equivalence between the original and translated code. It treats code translation as a multi-objective optimization challenge, integrating strong semantic signals with compiler-based syntactic feedback in a direct preference optimization framework. Comprehensive tests on C++ and other programming languages reveal advancements in both syntactic accuracy and semantic coherence.

Key facts

arXiv paper 2605.13229 proposes CTO for code translation
CTO uses syntax-guided and semantic-aware preference optimization
Contrastive learning trains a cross-lingual semantic model
Semantic model directly assesses functional equivalence
Code translation is formulated as multi-objective optimization
Compiler-based syntactic feedback is unified with semantic signals
Experiments conducted on C++ and other languages
Aims to improve syntactic correctness and semantic consistency

CTO: Syntax-Guided and Semantic-Aware Code Translation via Preference Optimization

Key facts

Entities

Institutions

Sources