DecomposeRL: RL-Based Claim Verification with Traceable Reasoning

ai-technology · 2026-05-28

DecomposeRL introduces an innovative system for verifying claims, merging the precision of end-to-end classifiers with the transparency of decomposition-based techniques. It conceptualizes claim decomposition as a reinforcement learning policy, utilizing GRPO and a diverse reward ensemble, which facilitates both fully supervised and semi-supervised learning from claims without labels. To mitigate the high training expenses associated with GRPO, DecomposeRL implements a data-curation funnel that refines 115K fact-verification claims into a streamlined set of 5K claims. A DecomposeRL-7B policy, trained under full supervision on approximately 5K curated claims, achieves balanced accuracy scores of 86.3 in-domain and 69.8 out-of-domain across 11 benchmarks in biomedical, political, scientific, and general domains.

Key facts

DecomposeRL frames decomposition as an RL policy trained with GRPO
Uses a multi-faceted reward ensemble
Enables semi-supervised learning from unlabeled claims
Data-curation funnel distills 115K claims to 5K
DecomposeRL-7B achieves 86.3 in-domain balanced accuracy
Achieves 69.8 out-of-domain balanced accuracy
Tested on 11 claim-verification benchmarks
Covers biomedical, political, scientific, and general-domain claims

DecomposeRL: RL-Based Claim Verification with Traceable Reasoning

Key facts

Entities

Institutions

Sources