AscendKernelGen Framework Uses LLMs to Generate NPU Compute Kernels

ai-technology · 2026-04-20

A novel framework named AscendKernelGen has been introduced to tackle the difficulties associated with generating compute kernels for Neural Processing Units (NPUs) through Large Language Models (LLMs). NPUs play a crucial role in contemporary AI systems, yet crafting efficient kernels necessitates specialized knowledge in vendor-specific Domain-Specific Languages (DSLs) and is often time-consuming. Despite the promise of LLMs in general code generation, they encounter challenges due to the stringent constraints and limited training data within the NPU sector. A preliminary investigation revealed that top-tier general-purpose LLMs achieved nearly zero success in producing functional complex kernels for Ascend NPUs. To address these challenges, AscendKernelGen combines generation and evaluation in one framework, featuring Ascend-CoT, a high-quality dataset that utilizes chain-of-thought reasoning from real-world examples. This initiative seeks to fully harness NPUs by automating kernel creation, essential for satisfying the growing need for computational efficiency in AI. The findings are detailed in arXiv:2601.07160v2, which replaces a previous version, emphasizing the challenges of adapting LLMs for specialized hardware and offering a structured approach to enhance kernel generation success rates.

Key facts

AscendKernelGen is a framework for generating NPU compute kernels using LLMs
NPUs are critical in modern AI infrastructure for computational efficiency
Kernel development for NPUs requires expertise in vendor-specific DSLs and is labor-intensive
General-purpose LLMs struggle with NPU domain constraints and limited training data
A preliminary study showed state-of-the-art LLMs fail to generate functional complex kernels for Ascend NPUs
AscendKernelGen integrates generation and evaluation into a single framework
Ascend-CoT is a high-quality dataset with chain-of-thought reasoning from real-world scenarios
The research is documented in arXiv:2601.07160v2

Entities

—

Sources

arXiv cs.AI — 2026-04-20