SolidCoder: LLM Code Generation via Concrete Execution
SolidCoder is an innovative framework created to address the Mental-Reality Gap in code generation by LLMs, where these models often produce incorrect execution traces and validate flawed code. This gap includes two main issues: the Specification Gap, which overlooks edge cases, and the Verification Gap, where models incorrectly assert that faulty code behaves correctly. SolidCoder uses property-based oracles to replace imaginary traces with actual sandboxed execution and stresses the importance of considering edge cases before designing algorithms. With GPT-4o, it achieves remarkable pass rates: 95.7% on HumanEval (up 0.6%), 77.0% on CodeContests (up 4.3%), and 26.7% on APPS (up 3.4%). The core idea here is "don't imagine — execute."
Key facts
- SolidCoder bridges the Mental-Reality Gap in LLM code generation.
- The gap includes Specification Gap and Verification Gap.
- SolidCoder uses sandboxed execution with property-based oracles.
- Achieves 95.7% pass@1 on HumanEval with GPT-4o.
- Achieves 77.0% pass@1 on CodeContests with GPT-4o.
- Achieves 26.7% pass@1 on APPS with GPT-4o.
- Framework forces edge-case awareness before algorithm design.
- Principle: don't imagine — execute.
Entities
—