SolidCoder: LLM Code Generation via Concrete Execution

ai-technology · 2026-04-24

SolidCoder is an innovative framework created to address the Mental-Reality Gap in code generation by LLMs, where these models often produce incorrect execution traces and validate flawed code. This gap includes two main issues: the Specification Gap, which overlooks edge cases, and the Verification Gap, where models incorrectly assert that faulty code behaves correctly. SolidCoder uses property-based oracles to replace imaginary traces with actual sandboxed execution and stresses the importance of considering edge cases before designing algorithms. With GPT-4o, it achieves remarkable pass rates: 95.7% on HumanEval (up 0.6%), 77.0% on CodeContests (up 4.3%), and 26.7% on APPS (up 3.4%). The core idea here is "don't imagine — execute."

Key facts

SolidCoder bridges the Mental-Reality Gap in LLM code generation.
The gap includes Specification Gap and Verification Gap.
SolidCoder uses sandboxed execution with property-based oracles.
Achieves 95.7% pass@1 on HumanEval with GPT-4o.
Achieves 77.0% pass@1 on CodeContests with GPT-4o.
Achieves 26.7% pass@1 on APPS with GPT-4o.
Framework forces edge-case awareness before algorithm design.
Principle: don't imagine — execute.

Entities

—

Sources

arXiv cs.AI — 2026-04-23