ARTFEED — Contemporary Art Intelligence

Energy-Based Verifier Boosts LLM Structured Reasoning

ai-technology · 2026-05-20

A novel decomposed energy function integrates a learned quality scorer with deterministic analytical constraint penalties to validate structured outputs generated by large language models. This quality scorer employs a diverse ensemble of low-rank adapters applied to a single frozen encoder, utilizing merely 3% of trainable parameters. The mean of the ensemble ranks the candidates, while the standard deviation assesses epistemic uncertainty, initiating a two-pass inference loop that prompts either targeted regeneration or abstention. In five benchmarks (GSM8K, MuSR, TravelPlanner, TACO, Knights & Knaves), the 149M-parameter verifier, managing a pool of 7-26B open generators, surpasses the single-shot Qwen-72B across all benchmarks and matches Claude. The study can be found on arXiv with ID 2605.18871.

Key facts

  • Proposes a decomposed energy function for verifying structured LLM outputs.
  • Combines a learned quality scorer with deterministic analytical constraint penalties.
  • Quality scorer is a heterogeneous ensemble of low-rank adapters on a single frozen encoder.
  • Only 3% of parameters are trainable.
  • Ensemble mean ranks candidates; standard deviation quantifies epistemic uncertainty.
  • Two-pass inference loop triggers targeted regeneration or abstention.
  • Tested on five benchmarks: GSM8K, MuSR, TravelPlanner, TACO, Knights & Knaves.
  • 149M-parameter verifier orchestrates a pool of 7-26B open generators.
  • Outperforms single-shot Qwen-72B on every benchmark.
  • Matches Claude performance.
  • Paper ID: arXiv:2605.18871.

Entities

Institutions

  • arXiv

Sources