GRPO Enhances Multi-Hop Fact Verification via Causal Models
A novel framework establishes a foundation for multi-hop fact verification based on structural causal models (SCMs), conceptualizing verification as a form of constructive causal inference. The researchers discovered an 'inverted U-shaped' relationship between the length of reasoning chains and accuracy, indicating that overly complex chains hinder performance. To tackle this issue, they suggest a rule-based reinforcement learning approach utilizing Group Relative Policy Optimization (GRPO) to maintain a dynamic equilibrium between structural depth and brevity. This research aims to resolve hallucinations and disjointed logical sequences in large language models (LLMs) for effective multi-hop fact verification.
Key facts
- Multi-Hop Fact Verification (MHFV) requires complex reasoning across disparate evidence.
- LLMs often suffer from hallucinations and fractured logical chains in MHFV.
- Existing methods like Chain-of-Thought (CoT) lack explicit causal dependency modeling.
- The framework grounds reasoning in a Structural Causal Model (SCM).
- An 'inverted U-shaped' correlation exists between reasoning chain length and accuracy.
- Excessive structural complexity degrades performance.
- A Rule-based Reinforcement Learning strategy using GRPO is proposed.
- GRPO dynamically optimizes the trade-off between structural depth and conciseness.
Entities
—