ARTFEED — Contemporary Art Intelligence

PiCSAR: Probabilistic Confidence Selection for LLM Reasoning

ai-technology · 2026-05-01

Researchers have unveiled a novel technique named PiCSAR, which stands for Probabilistic Confidence Selection And Ranking. This innovative, training-free method assesses outputs from large language models and large reasoning models by analyzing the combined log-likelihood of the reasoning process along with the final answer. PiCSAR consists of two main parts: reasoning confidence and answer confidence. It has shown remarkable improvements, achieving a +10.18 on MATH500 and a +9.81 on AIME2025, outperforming traditional models while needing at least twice as few samples in 16 out of 20 cases. Additionally, the analysis reveals that reasoning chains that are accurate exhibit a significantly higher joint log-likelihood.

Key facts

  • PiCSAR is a training-free method for scoring reasoning chains.
  • It uses joint log-likelihood of reasoning and answer.
  • Achieves +10.18 on MATH500 benchmark.
  • Achieves +9.81 on AIME2025 benchmark.
  • Outperforms baselines with at least 2x fewer samples in 16/20 comparisons.
  • Decomposes into reasoning confidence and answer confidence.
  • Improves best-of-n sampling for LLMs and LRMs.
  • No ground-truth answers required for scoring.

Entities

Sources