VerbatimRAG: Hallucination-Free QA for Research Papers

ai-technology · 2026-05-22

Researchers have developed VerbatimRAG, an extractive question answering system that eliminates hallucinations in AI-assisted research by mapping user queries directly to verbatim text spans in retrieved documents. The system is applied to the ACL Anthology and uses a novel ground truth dataset created via the ScIRGen methodology, with human annotation by NLP researchers. A 150M-parameter ModernBERT model is trained and evaluated on this benchmark. The approach addresses the tendency of LLMs to produce factually inaccurate output, providing a reliable method for collecting high-quality information from trusted sources.

Key facts

VerbatimRAG is an extractive QA system for research papers.
It maps user queries to verbatim text spans in retrieved documents.
Applied to the ACL Anthology.
Uses a novel ground truth dataset based on synthetic queries and ScIRGen methodology.
Human annotation performed by NLP researchers.
A 150M-parameter ModernBERT model is trained and evaluated.
Addresses LLM hallucination problem in research.
arXiv paper ID: 2605.21102.

VerbatimRAG: Hallucination-Free QA for Research Papers

Key facts

Entities

Institutions

Sources