LLM-Based Translation of Compiler Intermediate Representations

ai-technology · 2026-05-12

A new research paper introduces IRIS-14B, a 14-billion-parameter transformer model fine-tuned to translate between GCC's GIMPLE and LLVM's LLVM IR, two distinct Intermediate Representations used by major compilers. The work addresses the challenge of cross-toolchain interaction, which has been limited by semantic and structural differences between these IRs. Traditional rule-based translators have proven complex and costly to maintain. The authors propose a data-driven approach using Large Language Models (LLMs) to learn mappings from examples. The paper is available on arXiv under identifier 2605.08247.

Key facts

IRIS-14B is a 14-billion-parameter transformer model.
It translates between GIMPLE (GCC) and LLVM IR.
The paper is on arXiv with ID 2605.08247.
LLMs offer a data-driven alternative to rule-based translators.
GCC and LLVM underpin much modern software infrastructure.
Cross-toolchain interaction is hindered by IR differences.
Rule-based translators have high complexity and maintenance cost.
The model is fine-tuned for compiler IR translation.

LLM-Based Translation of Compiler Intermediate Representations

Key facts

Entities

Institutions

Sources