Xe-Forge: LLM-Powered Kernel Optimization for Intel GPU
Xe-Forge is a multi-stage LLM-powered pipeline that automates kernel optimization for Intel GPUs. It addresses the manual bottleneck of applying low-level optimizations—quantization, memory access coalescing, tile size tuning, and architecture-specific workarounds—to Triton kernels. The system applies up to nine optimization stages, including algorithmic restructuring, operator fusion, block pointer modernization, GPU-specific tuning, and open-ended discovery. Each stage is driven by a Chain-of-Verification-and-Refinement (CoVeR) agent that generates candidates and validates them. The work is published on arXiv (2605.26118) and targets deep learning algorithm porting to new hardware accelerators.
Key facts
- Xe-Forge automates kernel optimization for Intel GPU
- Applies up to nine optimization stages
- Uses Chain-of-Verification-and-Refinement (CoVeR) agents
- Targets Triton kernels
- Optimizations include quantization, memory coalescing, tile tuning
- Published on arXiv with ID 2605.26118
- Addresses manual bottleneck in porting deep learning algorithms
- System performs algorithmic restructuring and operator fusion
Entities
Institutions
- Intel
- arXiv