ProEval: A Proactive Framework for Efficient Generative AI Evaluation

ai-technology · 2026-04-29

ProEval is a proactive evaluation framework designed to address the resource-intensive nature of testing generative AI models, which suffer from slow inference, expensive raters, and a growing number of models and benchmarks. The framework uses transfer learning, employing pre-trained Gaussian Processes (GPs) as surrogates to map model inputs to performance metrics like error severity or safety violations. It frames performance estimation as Bayesian quadrature and failure discovery as superlevel set sampling, enabling uncertainty-aware decision strategies that actively select or synthesize informative test inputs. Theoretically, the pre-trained GP-based Bayesian quadrature estimator is proven unbiased and bounded. Empirically, ProEval has been validated on reasoning, safety alignment, and classification benchmarks. The paper is available on arXiv under identifier 2604.23099.

Key facts

ProEval is a proactive evaluation framework for generative AI models.
It uses transfer learning with pre-trained Gaussian Processes as surrogates.
Performance estimation is framed as Bayesian quadrature.
Failure discovery is framed as superlevel set sampling.
The estimator is proven unbiased and bounded.
Experiments were conducted on reasoning, safety alignment, and classification benchmarks.
The paper is on arXiv with ID 2604.23099.
The framework aims to reduce resource consumption in model evaluation.

ProEval: A Proactive Framework for Efficient Generative AI Evaluation

Key facts

Entities

Institutions

Sources