DDC Framework Balances Budget and Quality in LLM Inference Scaling
A new research paper introduces Dual-Dimensional Consistency (DDC), a unified framework for adaptive inference-time scaling in Large Language Models (LLMs). Current methods treat sampling width and depth as separate objectives, leading to inefficiencies: width consensus can reinforce hallucinations, while depth pruning may cut off valid reasoning chains. DDC couples a Confidence-Weighted Bayesian protocol with Trend-Aware Stratified Pruning to concentrate computational resources on high-quality paths, filtering hallucinations and accelerating consensus. Evaluations across five benchmarks show reduced token consumption while maintaining reasoning quality. The paper is available on arXiv under ID 2605.15100.
Key facts
- DDC is a unified framework for adaptive inference-time scaling.
- Current methods treat sampling width and depth as orthogonal objectives.
- Width consensus risks reinforcing hallucinations.
- Depth pruning prematurely truncates complex valid reasoning chains.
- DDC uses Confidence-Weighted Bayesian protocol and Trend-Aware Stratified Pruning.
- Evaluated across five benchmarks.
- Approach reduces token consumption.
- Paper available on arXiv: 2605.15100.
Entities
Institutions
- arXiv