ECPO: Evidence-Coupled Policy Optimization for Evidence-Certified Candidate Ranking
A new study on arXiv (2605.21993) introduces a fresh concept called Evidence-Coupled Policy Optimization (ECPO), aimed at ranking candidates through evidence. This technique produces a Top-K list that includes doc_id:span evidence certificates, which effectively back the decision process. It’s utilized on the MAVEN-ERE and RAMS datasets, incorporating set upstream extraction, randomized candidate identifiers within windows, trajectory supervision aligned with skeletons, hard negatives, and audit references. The goal of ECPO is to establish a clear trajectory reward by taking into account elements like skeleton alignment and the consistency of arguments.
Key facts
- Paper ID: arXiv:2605.21993
- Published on arXiv
- Introduces ECPO: Evidence-Coupled Policy Optimization
- Task: evidence-certified candidate ranking
- Outputs Top-K list with evidence certificates
- Instantiated on MAVEN-ERE and RAMS datasets
- Uses skeleton-aligned trajectory supervision
- Includes hard negatives and audit references
Entities
Institutions
- arXiv
- MAVEN-ERE
- RAMS