Energy-Based Verifier Boosts LLM Structured Reasoning

ai-technology · 2026-05-20

A novel decomposed energy function integrates a learned quality scorer with deterministic analytical constraint penalties to validate structured outputs generated by large language models. This quality scorer employs a diverse ensemble of low-rank adapters applied to a single frozen encoder, utilizing merely 3% of trainable parameters. The mean of the ensemble ranks the candidates, while the standard deviation assesses epistemic uncertainty, initiating a two-pass inference loop that prompts either targeted regeneration or abstention. In five benchmarks (GSM8K, MuSR, TravelPlanner, TACO, Knights & Knaves), the 149M-parameter verifier, managing a pool of 7-26B open generators, surpasses the single-shot Qwen-72B across all benchmarks and matches Claude. The study can be found on arXiv with ID 2605.18871.

Key facts

Proposes a decomposed energy function for verifying structured LLM outputs.
Combines a learned quality scorer with deterministic analytical constraint penalties.
Quality scorer is a heterogeneous ensemble of low-rank adapters on a single frozen encoder.
Only 3% of parameters are trainable.
Ensemble mean ranks candidates; standard deviation quantifies epistemic uncertainty.
Two-pass inference loop triggers targeted regeneration or abstention.
Tested on five benchmarks: GSM8K, MuSR, TravelPlanner, TACO, Knights & Knaves.
149M-parameter verifier orchestrates a pool of 7-26B open generators.
Outperforms single-shot Qwen-72B on every benchmark.
Matches Claude performance.
Paper ID: arXiv:2605.18871.

Energy-Based Verifier Boosts LLM Structured Reasoning

Key facts

Entities

Institutions

Sources