DecomposeRL: RL-Based Claim Verification with Traceable Reasoning
DecomposeRL introduces an innovative system for verifying claims, merging the precision of end-to-end classifiers with the transparency of decomposition-based techniques. It conceptualizes claim decomposition as a reinforcement learning policy, utilizing GRPO and a diverse reward ensemble, which facilitates both fully supervised and semi-supervised learning from claims without labels. To mitigate the high training expenses associated with GRPO, DecomposeRL implements a data-curation funnel that refines 115K fact-verification claims into a streamlined set of 5K claims. A DecomposeRL-7B policy, trained under full supervision on approximately 5K curated claims, achieves balanced accuracy scores of 86.3 in-domain and 69.8 out-of-domain across 11 benchmarks in biomedical, political, scientific, and general domains.
Key facts
- DecomposeRL frames decomposition as an RL policy trained with GRPO
- Uses a multi-faceted reward ensemble
- Enables semi-supervised learning from unlabeled claims
- Data-curation funnel distills 115K claims to 5K
- DecomposeRL-7B achieves 86.3 in-domain balanced accuracy
- Achieves 69.8 out-of-domain balanced accuracy
- Tested on 11 claim-verification benchmarks
- Covers biomedical, political, scientific, and general-domain claims
Entities
Institutions
- arXiv