Argus Framework Uses Data-Flow Invariants to Improve LLM-Generated GPU Kernel Performance
A novel framework known as Argus has been created to boost the performance of GPU kernels produced by coding agents based on LLM. Although these agents can generate accurate kernels for tasks such as matrix multiplication and Mixture-of-Experts (MoE), their efficiency falls short compared to manually optimized libraries. To achieve optimal GPU performance, coordinated optimizations are essential, yet current agents depend on sparse feedback, which hampers their ability to identify global constraint violations. Argus addresses this issue by employing data-flow invariants as compile-time specifications. It features a tile-based, Pythonic domain-specific language (DSL) that reveals hardware instructions and compiler policies. This research is detailed in an arXiv preprint (identifier 2604.18616v1), with the goal of closing the performance gap in high-performance GPU computing.
Key facts
- Argus is an agentic framework for optimizing GPU kernels generated by LLM-based coding agents.
- LLM-based agents produce functionally correct kernels but underperform compared to hand-optimized libraries.
- Key computations include matrix multiplication, attention, and Mixture-of-Experts (MoE).
- Peak GPU performance requires coordinated optimizations like tiling, shared-memory staging, software pipelining, and instruction scheduling.
- Existing agents use sparse pass/fail feedback, limiting their ability to diagnose global constraint violations.
- Argus employs data-flow invariants as compile-time specifications for data choreography during kernel execution.
- The framework features a tile-based, Pythonic DSL that exposes hardware instructions and compiler policies.
- The work is documented in arXiv preprint 2604.18616v1, announced as a cross-type submission.
Entities
Institutions
- arXiv