Verus-SpecGym: AI Agent Environment for Spec Autoformalization
A recent study presents Verus-SpecGym, an agentic setting designed for assessing the autoformalization of specifications by LLM agents. This research tackles the issue of confirming that code generated by AI aligns with user intentions via formal verification. The authors developed Verus-SpecBench, a collection of 581 specification-writing tasks based on Codeforces challenges, specifically aimed at Verus, a Rust verifier. This environment enables models to engage with Verus, bash, and the filesystem to formulate specifications. A key difficulty lies in evaluation, as creating expert-written reference specifications is costly, and LLM evaluators may overlook intricate errors.
Key facts
- Verus-SpecGym is an agentic environment for specification autoformalization
- Verus-SpecBench contains 581 spec-writing tasks from Codeforces
- Targets Verus, a verifier for Rust
- Models interact with Verus, bash, and filesystem
- Evaluation challenge: expensive expert specs and fallible LLM judges
- Published on arXiv with ID 2605.26457
- Focuses on translating informal problems into formal specifications
- Aims to ensure AI code satisfies user intent via formal verification
Entities
Institutions
- arXiv
- Codeforces