MiRD Framework for Reliable Set-Valued Prediction in Open-Ended QA
MiRD is a dual-phase framework designed for dependable set-valued predictions, tackling hallucinations in open-ended question answering by breaking down overall miscoverage into two components: sampling failure and conditional selection failure. In the first stage, it sets an expectation-level marginal upper limit on the likelihood that finite sampling yields no acceptable answer within a predetermined budget. The second stage fine-tunes a conformal selection threshold utilizing admission-correlated nonconformity scores from the complete calibration set, ensuring the integrity of the calibration set. The framework underwent testing on three open-ended QA datasets and eight models, successfully managing sampling risk.
Key facts
- MiRD decomposes miscoverage into sampling failure and conditional selection failure.
- Stage I provides an expectation-level marginal upper bound on sampling failure probability.
- Stage II calibrates a conformal selection threshold using admission-correlated nonconformity scores.
- The framework preserves calibration-set integrity by using the full calibration set.
- Tested on three open-ended QA datasets and eight models.
- MiRD controls sampling risk in set-valued prediction.
- The approach mitigates hallucinations in open-ended QA.
- The paper is published on arXiv with ID 2605.27091.
Entities
Institutions
- arXiv