DeepSciVerify: LLM-Driven Claim-Citation Verification with Selective Evidence Escalation

ai-technology · 2026-05-28

A team of researchers has introduced DeepSciVerify, a two-step system designed to validate the alignment between scientific claims and citations in reports generated by large language models (LLMs). Initially, the system evaluates claims against abstracts and subsequently escalates ambiguous cases for full-text passage retrieval. It has achieved a score of 86.7 Micro-F1 on the SCitance benchmark, surpassing abstract-only baselines by 4.5 points and successfully resolving 67% of cases without needing full-text access. This method enhances both precision and efficiency by utilizing the varied behaviors of LLMs—some being more cautious while others are more assertive in uncertain situations. This research tackles a prevalent issue in critical scientific environments.

Key facts

DeepSciVerify is a two-stage pipeline for scientific claim-citation verification.
It combines abstract-level reasoning with selective escalation to passage-level evidence.
The system first verifies claims using abstracts and defers uncertain cases.
Full-text passages are retrieved and analyzed only when necessary.
The design leverages complementary behaviors across LLMs.
On the SCitance benchmark, DeepSciVerify achieves 86.7 Micro-F1.
It outperforms strong abstract-only baselines by +4.5 points.
67% of instances are resolved without full-text retrieval.

Entities

—

Sources

arXiv cs.AI — 2026-05-28