GRPO Enhances Multi-Hop Fact Verification via Causal Models

ai-technology · 2026-05-06

A novel framework establishes a foundation for multi-hop fact verification based on structural causal models (SCMs), conceptualizing verification as a form of constructive causal inference. The researchers discovered an 'inverted U-shaped' relationship between the length of reasoning chains and accuracy, indicating that overly complex chains hinder performance. To tackle this issue, they suggest a rule-based reinforcement learning approach utilizing Group Relative Policy Optimization (GRPO) to maintain a dynamic equilibrium between structural depth and brevity. This research aims to resolve hallucinations and disjointed logical sequences in large language models (LLMs) for effective multi-hop fact verification.

Key facts

Multi-Hop Fact Verification (MHFV) requires complex reasoning across disparate evidence.
LLMs often suffer from hallucinations and fractured logical chains in MHFV.
Existing methods like Chain-of-Thought (CoT) lack explicit causal dependency modeling.
The framework grounds reasoning in a Structural Causal Model (SCM).
An 'inverted U-shaped' correlation exists between reasoning chain length and accuracy.
Excessive structural complexity degrades performance.
A Rule-based Reinforcement Learning strategy using GRPO is proposed.
GRPO dynamically optimizes the trade-off between structural depth and conciseness.

Entities

—

Sources

arXiv cs.AI — 2026-05-05