BootTrans Method Uses Test Suites for Multilingual Code Translation
BootTrans, a novel bootstrapping technique, tackles two significant obstacles in the translation of code among various programming languages: the limited availability of parallel data with functional test oracles and the uneven optimization across language pairs. This method utilizes the functional invariance and cross-lingual adaptability of test suites, transforming plentiful pivot-language unit tests into universal verification oracles for training in multilingual reinforcement learning. BootTrans features a dual-pool system consisting of seed and exploration pools to gradually enhance training data via execution-guided experience collection. It also employs a language-aware weighting system that adjusts priorities for more challenging translation directions based on performance among related languages, thus addressing optimization disparities. Detailed in the research paper with arXiv identifier 2601.03512v2, extensive experiments validate its capability to effectively manage diverse language pairs by overcoming both data limitations and optimization issues through innovative test suite applications and weighted exploration.
Key facts
- BootTrans is a bootstrapping method for code translation across multiple programming languages
- It addresses scarcity of parallel data with executable test oracles
- It solves optimization imbalance when handling diverse language pairs
- The method leverages functional invariance and cross-lingual portability of test suites
- It adapts pivot-language unit tests as universal verification oracles for multilingual RL training
- Uses dual-pool architecture with seed and exploration pools for progressive data expansion
- Includes language-aware weighting mechanism to prioritize harder translation directions
- Research paper available as arXiv:2601.03512v2 with Announce Type: replace-cross
Entities
Institutions
- arXiv