DDC Framework Balances Budget and Quality in LLM Inference Scaling

ai-technology · 2026-05-16

A new research paper introduces Dual-Dimensional Consistency (DDC), a unified framework for adaptive inference-time scaling in Large Language Models (LLMs). Current methods treat sampling width and depth as separate objectives, leading to inefficiencies: width consensus can reinforce hallucinations, while depth pruning may cut off valid reasoning chains. DDC couples a Confidence-Weighted Bayesian protocol with Trend-Aware Stratified Pruning to concentrate computational resources on high-quality paths, filtering hallucinations and accelerating consensus. Evaluations across five benchmarks show reduced token consumption while maintaining reasoning quality. The paper is available on arXiv under ID 2605.15100.

Key facts

DDC is a unified framework for adaptive inference-time scaling.
Current methods treat sampling width and depth as orthogonal objectives.
Width consensus risks reinforcing hallucinations.
Depth pruning prematurely truncates complex valid reasoning chains.
DDC uses Confidence-Weighted Bayesian protocol and Trend-Aware Stratified Pruning.
Evaluated across five benchmarks.
Approach reduces token consumption.
Paper available on arXiv: 2605.15100.

DDC Framework Balances Budget and Quality in LLM Inference Scaling

Key facts

Entities

Institutions

Sources