New Research Introduces Cross-Model Disagreement Method for LLM Uncertainty Quantification

ai-technology · 2026-04-22

A new research paper introduces a method to quantify uncertainty in large language models by combining self-consistency with cross-model disagreement. The work addresses a critical limitation where models produce confidently incorrect responses that remain consistent across multiple samples, causing aleatoric uncertainty proxies to fail. Researchers propose adding an epistemic uncertainty term calculated from semantic disagreement between different models in an ensemble. This total uncertainty metric, tested across five instruction-tuned models ranging from 7-9 billion parameters and ten long-form tasks, demonstrates improved ranking calibration and selection capabilities. The approach operates under black-box access conditions, requiring only generated text outputs from scale-matched model ensembles. The paper analyzes scenarios where models exhibit overconfidence while generating identical incorrect answers, showing that cross-model semantic disagreement increases precisely when aleatoric uncertainty measures collapse. This research contributes to more robust usage of LLMs by providing better uncertainty quantification methods that don't require access to model internals.

Key facts

Research introduces cross-model disagreement method for LLM uncertainty quantification
Addresses problem of models producing confident but incorrect responses
Combines aleatoric uncertainty with new epistemic uncertainty term
Tested on five 7-9B instruction-tuned models
Evaluated across ten long-form tasks
Operates in black-box access setting using only generated text
Uses semantic similarity comparisons between models
Improves ranking calibration and selection capabilities

New Research Introduces Cross-Model Disagreement Method for LLM Uncertainty Quantification

Key facts

Entities

Institutions

Sources