ARTFEED — Contemporary Art Intelligence

GLIDE Library Unifies Prediction-Powered Inference for Reliable GenAI Evaluation

ai-technology · 2026-06-01

GLIDE, a newly launched open-source Python library, integrates advanced prediction-powered inference (PPI) techniques for assessing generative AI and agentic systems. By merging expensive human annotations with biased LLM-as-judge proxies, PPI generates debiased estimates accompanied by reliable confidence intervals. This library encompasses various estimators, such as PPI++, Stratified PPI, Predict-Then-Debias, and Active Statistical Inference, as well as samplers like uniform, stratified, active, and cost-optimal, all through a scipy-style API tailored for mean estimation. Additionally, GLIDE features a reproducible Monte Carlo validation suite, a decision tree rooted in empirical data for method selection, and a case study on agentic evaluation that reveals significant annotation savings while maintaining precision. The library is accessible on GitHub.

Key facts

  • GLIDE is an open-source Python library for prediction-powered inference.
  • It unifies PPI estimators: PPI++, Stratified PPI, Predict-Then-Debias, Active Statistical Inference.
  • It includes samplers: uniform, stratified, active, cost-optimal.
  • API is scipy-style and specialized for mean estimation.
  • Comes with a reproducible Monte Carlo validation suite.
  • Includes an empirically grounded decision tree for method selection.
  • Agentic evaluation case study shows annotation savings at equivalent precision.
  • GLIDE is available on GitHub.

Entities

Institutions

  • arXiv

Sources