ARTFEED — Contemporary Art Intelligence

Always-Valid Release Wrapper for Black-Box AI Workflows

ai-technology · 2026-05-14

A novel statistical technique guarantees reliable stopping choices in LLM-driven generate-verify frameworks, eliminating the need for likelihood models or assumptions of exchangeability. This method establishes a reference pool of high-scoring failures as hard negatives, aligns evaluator scores during deployment with this pool, and gathers evidence using an e-process to ensure validity during optional stopping. It distinguishes the function of the reference pool in transforming black-box scores into cautious evidence from the e-process's function in delivering consistently valid inferences. Theoretical findings indicate that a conservative reference pool is adequate for ensuring validity.

Key facts

  • Proposes always-valid release wrapper for generator-evaluator pipelines
  • Builds hard-negative reference pool of high-scoring failures
  • Calibrates deployment-time evaluator scores against reference pool
  • Accumulates evidence with an e-process
  • Provides validity under optional stopping
  • Does not require likelihood models or exchangeability assumptions
  • Separates roles of reference pool and e-process
  • Theoretical guarantee: conservative reference pool suffices

Entities

Sources