CASS Dataset and Models Enable Cross-Architecture GPU Code Translation with High Accuracy

ai-technology · 2026-04-22

A newly launched dataset and model collection known as CASS aims to facilitate cross-architecture GPU code transpilation, filling a gap in scalable hardware portability solutions. This system allows for source-level translation between CUDA and HIP, as well as assembly-level translation between SASS and RDNA3. CASS comprises 60,000 validated host-device code pairs produced via an automated pipeline that scrapes, translates, compiles, and aligns GPU applications across various vendor platforms. Researchers utilized this dataset to develop specialized translation models, achieving an accuracy of 88.2% for converting CUDA to HIP and 69.1% for SASS to RDNA3. These outcomes significantly exceed commercial benchmarks, including GPT-5.1, Claude-4.5, and Hipify. The generated code retains native performance in 85% of instances while maintaining runtime and memory characteristics. To aid in evaluation, the team developed CASS-Bench, a tailored benchmarking tool. This work illustrates significant progress in low-level hardware portability through data-driven methodologies.

Key facts

CASS is a dataset and model suite for cross-architecture GPU code transpilation
Enables translation between CUDA and HIP at source level, SASS and RDNA3 at assembly level
Contains 60,000 verified host-device code pairs
Uses automated pipeline to scrape, translate, compile, and align GPU programs
Models achieve 88.2% accuracy on CUDA -> HIP translation
Models achieve 69.1% accuracy on SASS -> RDNA3 translation
Outperforms commercial baselines including GPT-5.1, Claude-4.5, and Hipify
Generated code matches native performance in 85% of cases

Entities

—

Sources

arXiv cs.AI — 2026-04-22