GLIDE Library Unifies Prediction-Powered Inference for Reliable GenAI Evaluation

ai-technology · 2026-06-01

GLIDE, a newly launched open-source Python library, integrates advanced prediction-powered inference (PPI) techniques for assessing generative AI and agentic systems. By merging expensive human annotations with biased LLM-as-judge proxies, PPI generates debiased estimates accompanied by reliable confidence intervals. This library encompasses various estimators, such as PPI++, Stratified PPI, Predict-Then-Debias, and Active Statistical Inference, as well as samplers like uniform, stratified, active, and cost-optimal, all through a scipy-style API tailored for mean estimation. Additionally, GLIDE features a reproducible Monte Carlo validation suite, a decision tree rooted in empirical data for method selection, and a case study on agentic evaluation that reveals significant annotation savings while maintaining precision. The library is accessible on GitHub.

Key facts

GLIDE is an open-source Python library for prediction-powered inference.
It unifies PPI estimators: PPI++, Stratified PPI, Predict-Then-Debias, Active Statistical Inference.
It includes samplers: uniform, stratified, active, cost-optimal.
API is scipy-style and specialized for mean estimation.
Comes with a reproducible Monte Carlo validation suite.
Includes an empirically grounded decision tree for method selection.
Agentic evaluation case study shows annotation savings at equivalent precision.
GLIDE is available on GitHub.

GLIDE Library Unifies Prediction-Powered Inference for Reliable GenAI Evaluation

Key facts

Entities

Institutions

Sources