ARTFEED — Contemporary Art Intelligence

New Research Introduces Cross-Model Disagreement Method for LLM Uncertainty Quantification

ai-technology · 2026-04-22

A new research paper introduces a method to quantify uncertainty in large language models by combining self-consistency with cross-model disagreement. The work addresses a critical limitation where models produce confidently incorrect responses that remain consistent across multiple samples, causing aleatoric uncertainty proxies to fail. Researchers propose adding an epistemic uncertainty term calculated from semantic disagreement between different models in an ensemble. This total uncertainty metric, tested across five instruction-tuned models ranging from 7-9 billion parameters and ten long-form tasks, demonstrates improved ranking calibration and selection capabilities. The approach operates under black-box access conditions, requiring only generated text outputs from scale-matched model ensembles. The paper analyzes scenarios where models exhibit overconfidence while generating identical incorrect answers, showing that cross-model semantic disagreement increases precisely when aleatoric uncertainty measures collapse. This research contributes to more robust usage of LLMs by providing better uncertainty quantification methods that don't require access to model internals.

Key facts

  • Research introduces cross-model disagreement method for LLM uncertainty quantification
  • Addresses problem of models producing confident but incorrect responses
  • Combines aleatoric uncertainty with new epistemic uncertainty term
  • Tested on five 7-9B instruction-tuned models
  • Evaluated across ten long-form tasks
  • Operates in black-box access setting using only generated text
  • Uses semantic similarity comparisons between models
  • Improves ranking calibration and selection capabilities

Entities

Institutions

  • arXiv

Sources