Energy-Based Verifier Boosts LLM Structured Reasoning
A novel decomposed energy function integrates a learned quality scorer with deterministic analytical constraint penalties to validate structured outputs generated by large language models. This quality scorer employs a diverse ensemble of low-rank adapters applied to a single frozen encoder, utilizing merely 3% of trainable parameters. The mean of the ensemble ranks the candidates, while the standard deviation assesses epistemic uncertainty, initiating a two-pass inference loop that prompts either targeted regeneration or abstention. In five benchmarks (GSM8K, MuSR, TravelPlanner, TACO, Knights & Knaves), the 149M-parameter verifier, managing a pool of 7-26B open generators, surpasses the single-shot Qwen-72B across all benchmarks and matches Claude. The study can be found on arXiv with ID 2605.18871.
Key facts
- Proposes a decomposed energy function for verifying structured LLM outputs.
- Combines a learned quality scorer with deterministic analytical constraint penalties.
- Quality scorer is a heterogeneous ensemble of low-rank adapters on a single frozen encoder.
- Only 3% of parameters are trainable.
- Ensemble mean ranks candidates; standard deviation quantifies epistemic uncertainty.
- Two-pass inference loop triggers targeted regeneration or abstention.
- Tested on five benchmarks: GSM8K, MuSR, TravelPlanner, TACO, Knights & Knaves.
- 149M-parameter verifier orchestrates a pool of 7-26B open generators.
- Outperforms single-shot Qwen-72B on every benchmark.
- Matches Claude performance.
- Paper ID: arXiv:2605.18871.
Entities
Institutions
- arXiv