Verus-SpecGym: AI Agent Environment for Spec Autoformalization

publication · 2026-05-27

A recent study presents Verus-SpecGym, an agentic setting designed for assessing the autoformalization of specifications by LLM agents. This research tackles the issue of confirming that code generated by AI aligns with user intentions via formal verification. The authors developed Verus-SpecBench, a collection of 581 specification-writing tasks based on Codeforces challenges, specifically aimed at Verus, a Rust verifier. This environment enables models to engage with Verus, bash, and the filesystem to formulate specifications. A key difficulty lies in evaluation, as creating expert-written reference specifications is costly, and LLM evaluators may overlook intricate errors.

Key facts

Verus-SpecGym is an agentic environment for specification autoformalization
Verus-SpecBench contains 581 spec-writing tasks from Codeforces
Targets Verus, a verifier for Rust
Models interact with Verus, bash, and filesystem
Evaluation challenge: expensive expert specs and fallible LLM judges
Published on arXiv with ID 2605.26457
Focuses on translating informal problems into formal specifications
Aims to ensure AI code satisfies user intent via formal verification

Verus-SpecGym: AI Agent Environment for Spec Autoformalization

Key facts

Entities

Institutions

Sources