LLM-Based Translation of Compiler Intermediate Representations
A new research paper introduces IRIS-14B, a 14-billion-parameter transformer model fine-tuned to translate between GCC's GIMPLE and LLVM's LLVM IR, two distinct Intermediate Representations used by major compilers. The work addresses the challenge of cross-toolchain interaction, which has been limited by semantic and structural differences between these IRs. Traditional rule-based translators have proven complex and costly to maintain. The authors propose a data-driven approach using Large Language Models (LLMs) to learn mappings from examples. The paper is available on arXiv under identifier 2605.08247.
Key facts
- IRIS-14B is a 14-billion-parameter transformer model.
- It translates between GIMPLE (GCC) and LLVM IR.
- The paper is on arXiv with ID 2605.08247.
- LLMs offer a data-driven alternative to rule-based translators.
- GCC and LLVM underpin much modern software infrastructure.
- Cross-toolchain interaction is hindered by IR differences.
- Rule-based translators have high complexity and maintenance cost.
- The model is fine-tuned for compiler IR translation.
Entities
Institutions
- arXiv