CTO: Syntax-Guided and Semantic-Aware Code Translation via Preference Optimization
A recent study published on arXiv (2605.13229) presents CTO, a novel approach aimed at enhancing code translation by large language models (LLMs) through syntax-guided and semantic-aware preference optimization. The researchers contend that current preference-based learning often depends on unreliable semantic rewards derived from limited test cases or narrow reference translations. CTO employs contrastive learning to train a cross-lingual semantic model that evaluates functional equivalence between the original and translated code. It treats code translation as a multi-objective optimization challenge, integrating strong semantic signals with compiler-based syntactic feedback in a direct preference optimization framework. Comprehensive tests on C++ and other programming languages reveal advancements in both syntactic accuracy and semantic coherence.
Key facts
- arXiv paper 2605.13229 proposes CTO for code translation
- CTO uses syntax-guided and semantic-aware preference optimization
- Contrastive learning trains a cross-lingual semantic model
- Semantic model directly assesses functional equivalence
- Code translation is formulated as multi-objective optimization
- Compiler-based syntactic feedback is unified with semantic signals
- Experiments conducted on C++ and other languages
- Aims to improve syntactic correctness and semantic consistency
Entities
Institutions
- arXiv