LLMs as Selective GPU Surrogates for Kernel Optimization

ai-technology · 2026-06-01

A new arXiv preprint (2605.31464) proposes using large language models as selective surrogates for GPU kernel runtime evaluation. GPU kernel optimization, essential for deep learning, typically requires costly on-device measurements involving compilation and repeated execution. As LLM-driven searches scale, this evaluation becomes a bottleneck. The study explores how LLMs can forecast kernel performance, deferring to actual hardware when uncertain. The surrogate must be accurate, calibrated, and practically useful for recovering fast kernels. The paper evaluates these criteria without naming specific LLMs or datasets.

Key facts

arXiv preprint 2605.31464 proposes LLMs as selective GPU surrogates for kernel runtime optimization.
GPU kernel optimization typically requires costly on-device measurement via compilation and execution.
LLM-driven kernel searches are scaling, making on-device evaluation a bottleneck.
The surrogate must be accurate, selective (knowing when to defer), and calibrated.
Evaluation criteria include forecast accuracy, calibration, and practical utility for recovering fast kernels.
The study does not specify which LLMs or hardware were used.
The paper is categorized as a cross-type announcement on arXiv.
The approach aims to reduce the cost of kernel evaluation in deep learning.

LLMs as Selective GPU Surrogates for Kernel Optimization

Key facts

Entities

Institutions

Sources