ARTFEED — Contemporary Art Intelligence

SIEVES: Visual Evidence Scoring Boosts MLLM Selective Prediction

ai-technology · 2026-04-30

The recently introduced technique known as SIEVES (Selective Prediction through Visual Evidence Scoring) enhances the dependability of multimodal large language models (MLLMs) in out-of-distribution (OOD) contexts. This method necessitates that reasoner models generate localized visual evidence during responses, while a selector is trained to evaluate the quality of this localization. By providing confidence scores and refraining from responding to low-confidence queries, SIEVES demonstrates up to threefold improvement in coverage across difficult OOD benchmarks, all while maintaining user-defined risk parameters. The research is available on arXiv with the identifier 2604.25855.

Key facts

  • SIEVES stands for Selective Prediction through Visual Evidence Scoring
  • Method improves coverage by up to three times on OOD benchmarks
  • Requires reasoner models to produce localized visual evidence
  • Selector learns to estimate quality of localization
  • Targets reliable deployment in real-world out-of-distribution scenarios
  • Paper available on arXiv with ID 2604.25855
  • Addresses selective prediction for MLLMs
  • Uses confidence scoring and abstention mechanism

Entities

Institutions

  • arXiv

Sources