Xe-Forge: LLM-Powered Kernel Optimization for Intel GPU

ai-technology · 2026-05-27

Xe-Forge is a multi-stage LLM-powered pipeline that automates kernel optimization for Intel GPUs. It addresses the manual bottleneck of applying low-level optimizations—quantization, memory access coalescing, tile size tuning, and architecture-specific workarounds—to Triton kernels. The system applies up to nine optimization stages, including algorithmic restructuring, operator fusion, block pointer modernization, GPU-specific tuning, and open-ended discovery. Each stage is driven by a Chain-of-Verification-and-Refinement (CoVeR) agent that generates candidates and validates them. The work is published on arXiv (2605.26118) and targets deep learning algorithm porting to new hardware accelerators.

Key facts

Xe-Forge automates kernel optimization for Intel GPU
Applies up to nine optimization stages
Uses Chain-of-Verification-and-Refinement (CoVeR) agents
Targets Triton kernels
Optimizations include quantization, memory coalescing, tile tuning
Published on arXiv with ID 2605.26118
Addresses manual bottleneck in porting deep learning algorithms
System performs algorithmic restructuring and operator fusion

Xe-Forge: LLM-Powered Kernel Optimization for Intel GPU

Key facts

Entities

Institutions

Sources